At the beginning of summer, we put B2 Copy File APIs into beta. We’re pleased to announce the end of the beta and that the APIs are all now public!
We had a number of people use the beta features and give us great feedback. In fact, because of the feedback, we were able to implement an incremental feature.
New Feature — Bucket to Bucket Copies
Initially, our guidance was that these new APIs were only to be used within the same B2 bucket, but in response to customer and partner feedback, we added the ability to copy files from one bucket to another bucket within the same account.
To use this new feature with b2_copy_file, simply pass in the destinationBucketId where the new file copy will be stored. If this is not set, the copied file will simply default to the same bucket as the source file. Within b2_copy_part, there is a subtle difference in that the Source File ID can belong to a different bucket than the Large File ID.
For the complete API documentation, refer to the Backblaze B2 docs online:
In a literal sense, the new capability enables you to create a new file (or new part of a large file) that is a copy of an existing file (or range of an existing file). You can either copy over the source file’s metadata or specify new metadata for the new file that is created. This all occurs without having to download or re-upload any data.
This has been one of our most requested features as it unlocks:
Rename/Re-organize. The new capabilities give customers the ability to re-organize their files without having to download and re-upload. This is especially helpful when trying to mirror the contents of a file system to B2.
Synthetic Backup. With the ability to copy ranges of a file, users can now leverage B2 for synthetic backup, i.e. uploading a full backup but then only uploading incremental changes (as opposed to re-uploading the whole file with every change). This is particularly helpful for applications like backing up VMs where re-uploading the entirety of the file every time it changes can be inefficient.
While many of our customers directly leverage our APIs, just as many use 3rd party software (B2 Integration Partners) to facilitate storage into B2. Our Integration Partners were very helpful and active in giving us feedback during the beta. Some highlights of those that are already supporting the copy_file feature:
Transmit: macOS file transfer/cloud storage application that supports high speed copying to data between your Mac and more than 15 different cloud services.
RClone: Rsync for cloud storage is a powerful command line tool to copy and sync files to and from local disk, SFTP servers, and many cloud storage providers.
Mountain Duck: Mount server and cloud storage as a disk (Finder on macOS; File Explorer on Windows). With Mountain Duck, you can also open remote files with any application as if the file were on a local volume.
Cyberduck: File transfer/cloud storage browser for Mac and Windows with support for more than 10 different cloud services.
At the end of Q2 2019, Backblaze was using 108,660 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 60 drives (see why below). This leaves us with 108,461 hard drives. The table below covers what happened in Q2 2019.
Notes and Observations
If a drive model has a failure rate of 0 percent, it means there were no drive failures of that model during Q2 2019 — lifetime failure rates are later in this report. The two drives listed with zero failures in Q2 were the 4 TB and 14 TB Toshiba models. The Toshiba 4 TB drive doesn’t have a large enough number of drives or drive days to be statistically reliable, but only one drive of that model has failed in the last three years. We’ll dig into the 14 TB Toshiba drive stats a little later in the report.
There were 199 drives (108,660 minus 108,461) that were not included in the list above because they were used as testing drives or we did not have at least 60 of a given drive model. We now use 60 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics as there are 60 drives in all newly deployed Storage Pods — older Storage Pod models had a minimum of 45.
2,000 Backblaze Storage Pods? Almost…
We currently have 1,980 Storage Pods in operation. All are version 5 or version 6 as we recently gave away nearly all of the older Storage Pods to folks who stopped by our Sacramento storage facility. Nearly all, as we have a couple in our Storage Pod museum. There are currently 544 version 5 pods each containing 45 data drives, and there are 1436 version 6 pods each containing 60 data drives. The next time we add a Backblaze Vault, which consists of 20 Storage Pods, we will have 2,000 Backblaze Storage Pods in operation.
Goodbye Western Digital
In Q2 2019, the last of the Western Digital 6 TB drives were retired from service. The average age of the drives was 50 months. These were the last of our Western Digital branded data drives. When Backblaze was first starting out, the first data drives we deployed en masse were Western Digital Green 1 TB drives. So, it is with a bit of sadness to see our Western Digital data drive count go to zero. We hope to see them again in the future.
Hello “Western Digital”
While the Western Digital brand is gone, the HGST brand (owned by Western Digital) is going strong as we still have plenty of the HGST branded drives, about 20 percent of our farm, ranging in size from 4 to 12 TB. In fact, we added over 4,700 HGST 12 TB drives in this quarter.
This just in; rumor has it there are twenty 14 TB Western Digital Ultrastar drives getting readied for deployment and testing in one of our data centers. It appears Western Digital has returned: stay tuned.
Goodbye 5 TB Drives
Back in Q1 2015, we deployed 45 Toshiba 5 TB drives. They were the only 5 TB drives we deployed as the manufacturers quickly moved on to larger capacity drives, and so did we. Yet, during their four plus years of deployment only two failed, with no failures since Q2 of 2016 — three years ago. This made it hard to say goodbye, but buying, stocking, and keeping track of a couple of 5 TB spare drives was not optimal, especially since these spares could not be used anywhere else. So yes, the Toshiba 5 TB drives were the odd ducks on our farm, but they were so good they got to stay for over four years.
Hello Again Toshiba 14 TB Toshiba Drives
We’ve mentioned the Toshiba 14 TB drives in previous reports, now we can dig in a little deeper given that they have been deployed almost nine months and we have some experience working with them. These drives got off to a bit of a rocky start, with six failures in the first three months of being deployed. Since then, there has been only one additional failure, with no failures reported in Q2 2019. The result is that the lifetime annualized failure rate for the Toshiba 14 TB drives has decreased to a very respectable 0.78% as shown in the lifetime table in the following section.
Lifetime Hard Drive Stats
The table below shows the lifetime failure rates for the hard drive models we had in service as of June 30, 2019. This is over the period beginning in April 2013 and ending June 30, 2019.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone; it is free. Good luck and let us know if you find anything interesting.
Ah, the iconic 3.5″ hard drive, now approaching a massive 16TB of storage capacity. Backblaze storage pods fit 60 of these drives in a single pod, and with well over 750 petabytes of customer data under management in our data centers, we have a lot of hard drives under management.
Yet most of us have just one, or only a few of these massive drives at a time storing our most valuable data. Just how safe are those hard drives in your office or studio? Have you ever thought about all the awful, terrible things that can happen to a hard drive? And what are they, exactly?
It turns out there are a host of obvious physical dangers, but also other, less obvious, errors that can affect the data stored on your hard drives, as well.
Dividing by One
It’s tempting to store all of your content on a single hard drive. After all, the capacity of these drives gets larger and larger, and they offer great performance of up to 150 MB/s. It’s true that flash-based hard drives are far faster, but the dollars per gigabyte price is also higher, so for now, the traditional 3.5″ hard drive holds most data today.
However, having all of your precious content on a single, spinning hard drive is a true tightrope without a net experience. Here’s why.
Drivesaver Failure Analysis by the Numbers
I asked our friends at Drivesavers, specialists in recovering data from drives and other storage devices, for some analysis of the hard drives brought into their labs for recovery. What were the primary causes of failure?
Reason One: Media Damage
The number one reason, accounting for 70 percent of failures, is media damage, including full head crashes.
Modern hard drives stuff multiple, ultra thin platters inside that 3.5 inch metal package. These platters spin furiously at 5400 or 7200 revolutions per minute — that’s 90 or 120 revolutions per second! The heads that read and write magnetic data on them sweep back and forth only 6.3 micrometers above the surface of those platters. That gap is about 1/12th the width of a human hair and a miracle of modern technology to be sure. As you can imagine, a system with such close tolerances is vulnerable to sudden shock, as evidenced by Drivesavers’ results.
This damage occurs when the platters receive shock, i.e. physical damage from impact to the drive itself. Platters have been known to shatter, or have damage to their surfaces, including a phenomenon called head crash, where the flying heads slam into the surface of the platters. Whatever the cause, the thin platters holding 1s and 0s can’t be read.
It takes a surprisingly small amount of force to generate a lot of shock energy to a hard drive. I’ve seen drives fail after simply tipping over when stood on end. More typically, drives are accidentally pushed off of a desktop, or dropped while being carried around.
A drive might look fine after a drop, but the damage may have been done. Due to their rigid construction, heavy weight, and how often they’re dropped on hard, unforgiving surfaces, these drops can easily generate the equivalent of hundreds of g-forces to the delicate internals of a hard drive.
To paraphrase an old (and morbid) parachutist joke, it’s not the fall that gets you, it’s the sudden stop!
Reason Two: PCB Failure
The next largest cause is circuit board failure, accounting for 18 percent of failed drives. Printed circuit boards (PCBs), those tiny green boards seen on the underside of hard drives, can fail in the presence of moisture or static electric discharge like any other circuit board.
Reason Three: Stiction
Next up is stiction (a portmanteau of friction and sticking), which occurs when the armatures that drive those flying heads actually get stuck in place and refuse to operate, usually after a long period of disuse. Drivesavers found that stuck armatures accounted for 11 percent of hard drive failures.
It seems counterintuitive that hard drives sitting quietly in a dark drawer might actually contribute to its failure, but I’ve seen many older hard drives pulled from a drawer and popped into a drive carrier or connected to power just go thunk. It does appear that hard drives like to be connected to power and constantly spinning and the numbers seem to bear this out.
Reason Four: Motor Failure
The last, and least common cause of hard drive failure, is hard drive motor failure, accounting for only 1 percent of failures, testament again to modern manufacturing precision and reliability.
Mitigating Hard Drive Failure Risk
So now that you’ve seen the gory numbers, here are a few recommendations to guard against the physical causes of hard drive failure.
1. Have a physical drive handling plan and follow it rigorously
If you must keep content on single hard drives in your location, make sure your team follows a few guidelines to protect against moisture, static electricity, and drops during drive handling. Keeping the drives in a dry location, storing the drives in static bags, using static discharge mats and wristbands, and putting rubber mats under areas where you’re likely to accidentally drop drives can all help.
It’s worth reviewing how you physically store drives, as well. Drivesavers tells us that the sudden impact of a heavy drawer of hard drives slamming home or yanked open quickly might possibly damage hard drives!
2. Spread failure risk across more drives and systems
Improving physical hard drive handling procedures is only a small part of a good risk-reducing strategy. You can immediately reduce the exposure of a single hard drive failure by simply keeping a copy of that valuable content on another drive.This is a common approach for videographers moving content from cameras shooting in the field back to their editing environment. By simply copying content over from one fast drive to another, the odds of both drives failing at once are less likely. This is certainly better than keeping content on only a single drive, but definitely not a great long-term solution.
Multiple drive NAS and RAID systems reduce the impact of failing drives even further. A RAID 6 system composed of eight drives not only has much faster read and write performance than a single drive, but two of its drives can fail and still serve your files, giving you time to replace those failed drives.
Mitigating Data Corruption Risk
The Risk of Bit Flips
Beyond physical damage, there’s another threat to the files stored on hard disks: small, silent bit flip errors often called data corruption or bit rot.
Bit rot errors occur when individual bits in a stream of data in files change from one state to another (positive or negative, 0 to 1, and vice versa). These errors can happen to hard drive and flash storage systems at rest, or be introduced as a file is copied from one hard drive to another.
While hard drives automatically correct single-bit flips on the fly, larger bit flips can introduce a number of errors. This can either cause the program accessing them to halt or throw an error, or perhaps worse, lead you to think that the file with the errors is fine!
Flash drives are not immune either. Bianca Shroeder recently published a similar study of flash drives, Flash Reliability in Production: The Expected and the Unexpected, and found that “…between 20-63% of drives experienced at least one of the (unrecoverable read errors) during the time it was in production. In addition, between 2-6 out of 1,000 drive days were affected.”
“These UREs are almost exclusively due to bit corruptions that ECC cannot correct. If a drive encounters a URE, the stored data cannot be read. This either results in a failed read in the user’s code, or if the drives are in a RAID group that has replication, then the data is read from a different drive.”
Exactly how prevalent bit flips are is a controversial subject, but if you’ve ever retrieved a file from an old hard drive or RAID system and see sparkles in video, corrupt document files, or lines or distortions in pictures, you’ve seen the results of these errors.
Protecting Against Bit Flip Errors
There are many approaches to catching and correcting bit flip errors. From a system designer standpoint they usually involve some combination of multiple disk storage systems, multiple copies of content, data integrity checks and corrections, including error-correcting code memory, physical component redundancy, and a file system that can tie it all together.
Backblaze has built such a system, and uses a number of techniques to detect and correct file degradation due to bit flips and deliver extremely high data durability and integrity, often in conjunction with Reed-Solomon erasure codes.
Thanks to the way object storage and Backblaze B2 works, files written to B2 are always retrieved exactly as you originally wrote them. If a file ever changes from the time you’ve written it, say, due to bit flip errors, it will either be reproduced from a redundant copy of your file, or even mathematically reconstructed with erasure codes.
So the simplest, and certainly least expensive way to get bit flip protection for the content sitting on your hard drives is to simply have another copy on cloud storage.
With some thought, you can apply these protection steps to your environment and get the best of both worlds: the performance of your content on fast, local hard drives, and the protection of having a copy on object storage offsite with the ultimate data integrity.
When shopping for a cloud storage provider, customers should ask a few key questions of potential storage providers. In addition to inquiring about storage cost, data center location, and features and capabilities of the service, they’re going to want to know the numbers for two key metrics for measuring cloud storage performance: durability and availability.
Think of durability as a measurement of how healthy and resilient your data is. You want your data to be as intact and pristine on the day you retrieve it as it was on the day you stored it.
There are a number of ways that data can lose its integrity.
1. Data loss
Data loss can happen through human accident, natural or manmade disaster, or even malicious action out of your control. Whether you store data in your home, office, or with a cloud provider, that data needs to be protected as much as possible from any event that could damage or destroy it. If your data is on a computer, external drive, or NAS in a home or office, you obviously want to keep the computing equipment away from water sources and other environmental hazards. You also have to consider the likelihood of fire, theft, and accidental deletion.
Data center managers go to great lengths to protect data under their care. That care starts with locating a facility in as safe a geographical location as possible, having secure facilities with controlled access, and monitoring and maintaining the storage infrastructure (chassis, drives, cables, power, cooling, etc.)
2. Data corruption
Data on traditional spinning hard drive systems can degrade with time, have errors introduced during copying, or become corrupted in any number of ways. File and operating systems and utilities have ways to double check that data is handled correctly during common file and data handling operations, but corruption can sneak into a system if it isn’t monitored closely or if the storage system doesn’t specifically check for such errors such as is common with systems with ECC (Error Correcting Code) RAM. Object storage systems will commonly monitor for any changes in the data, and often will automatically repair or provide warnings when data has been changed.
How is Durability Measured?
Object storage providers express data durability as an annual percentage in nines, as in two nines before the decimal point and as many nines as warranted after the decimal point. For example, eleven nines of durability is expressed as 99.999999999%.
Of the major vendors, Azure claims 12 nines and even 16 nines durability for some services, while Amazon S3, Google Cloud Platform and Backblaze offer 11 nines, or 99.999999999% annual durability.
What this means is that those services are promising that your data will remain intact while it is under their care, and no more than 0.000000001 percent of your data will be lost in a year (in the case of eleven nines annual durability).
How is Durability Maintained?
Generally, there are two ways to maintain data durability. The first approach is to use software algorithms and metadata such as checksums to detect corruption of the data. If corruption is found, the data can be healed using the stored information. Examples of these approaches are erasure coding and Reed-Solomon coding.
Another tried and true method to ensure data integrity is to simply store multiple copies of the data in multiple locations. This is known as redundancy. This approach allows data to survive the loss or corruption of data in one or even multiple locations through accident, war, theft, or any manner of natural disaster or alien invasion. All that’s required is that at least one copy of the data remains intact. The odds for data survival increase with the number of copies stored, with multiple locations an important multiplying factor. If multiple copies (and locations) are lost, well, that means we’re all in a lot of trouble and perhaps there might be other things to think about than the data you have stored.
The best approach is a combination of the above two approaches. Home data storage appliances such as NAS can provide the algorithmic protection through RAID and other technologies. If you store at least one copy of your data in a different location than your office or home than you’ve got redundancy covered, as well. The redundant location can be as simple as a USB or hard drive you regularly drop off in your old bedroom’s closet at mom’s house or a data center in another state that gets a daily backup from your office computer or network.
What is Availability?
If durability can be compared to how well your picnic basket contents survived the automobile trip to the beach, then you might get a good understanding of availability if you subsequently stand and watch that basket being carried out to sea by a wave. The chicken salad sandwich in the basket might be in great shape but you won’t be enjoying it.
Availability is how much time the storage provider guarantees that your data and services are available to you. This is usually documented as a percent of time per year, e.g. 99.9% (or three nines) means that your data will be available to you from the data center and you will be unable to access the data for no more than about ten minutes per week, or 8.77 hours per year. Data centers often plan downtime for maintenance, which is acceptable as long as you have no immediate need of the data during those maintenance windows.
What availability is suitable for your data depends, of course, on how you’re using it. If you’re running an e-commerce site, reservation service, or a site that requires real-time transactions, then availability can be expressed in real dollars for any unexpected downtime. If you are simply storing backups, or serving media for a website that doesn’t get a lot of traffic, you probably can live with the service being unavailable on occasion.
There are of course no guarantees for connectivity issues that affect availability that are out of the control of the storage provider, such as internet outages, bad connections, or power losses affecting your connection to the storage provider.
Guarantees of Availability
Your cloud service provider should both publish and guarantee availability. Much like an insurance policy, the guarantee should be in terms that compensate you if the provider falls short of the guaranteed availability metrics. Naturally, the better the guarantee and the greater the availability, the more reliable and expensive the service will be.
Be sure to read the service level agreement (SLA) closely, to see how your vendor defines availability. A provider might define zero downtime if a single internet client can access even one service, while others might require that multiple internet service providers and countries can access all services to be defined as available.
The Bottom Line on Data Durability and Availability
The bottom line is that no number of nines can absolutely protect your data. Human error or acts of nature can always intercede to make the best plans to protect data go awry. The decision you should make is to decide how important the data is to you and whether you can afford to not have access to it temporarily or to lose it completely. That will guide what strategy or vendor you should use to protect that data.
Generally, having multiple copies of your data in different places, using reliable vendors for storage providers, and making sure that the infrastructure storing your data and your access to it will be supported (power, service payments, etc), will go a long way in ensuring that your data will continue to be stable and there when you need it.
A lot has changed in the four years since Brian Beach wrote a post announcing Backblaze Vaults, our software architecture for cloud data storage. Just looking at how the major statistics have changed, we now have over 100,000 hard drives in our data centers instead of the 41,000 mentioned in the post video. We have three data centers (soon four) instead of one data center. We’re approaching one exabyte of data stored for our customers (almost seven times the 150 petabytes back then), and we’ve recovered over 41 billion files for our customers, up from the 10 billion in the 2015 post.
In the original post, we discussed having durability of seven nines. Shortly thereafter, it was upped to eight nines. In July of 2018, we took a deep dive into the calculation and found our durability closer to eleven nines (and went into detail on the calculations used to arrive at that number). And, as followers of our Hard Drive Stats reports will be interested in knowing, we’ve just started using our first 16 TB drives, which are twice the size of the biggest drives we used back at the time of this post — then a whopping eight TB.
We’ve updated the details here and there in the text from the original post that was published on our blog on March 11, 2015. We’ve left the original 135 comments intact, although some of them might be non sequiturs after the changes to the post. We trust that you will be able to sort out the old from the new and make sense of what’s changed. If not, please add a comment and we’ll be happy to address your questions.
Storage Vaults form the core of Backblaze’s cloud services. Backblaze Vaults are not only incredibly durable, scalable, and performant, but they dramatically improve availability and operability, while still being incredibly cost-efficient at storing data. Back in 2009, we shared the design of the original Storage Pod hardware we developed; here we’ll share the architecture and approach of the cloud storage software that makes up a Backblaze Vault.
Backblaze Vault Architecture for Cloud Storage
The Vault design follows the overriding design principle that Backblaze has always followed: keep it simple. As with the Storage Pods themselves, the new Vault storage software relies on tried and true technologies used in a straightforward way to build a simple, reliable, and inexpensive system.
A Backblaze Vault is the combination of the Backblaze Vault cloud storage software and the Backblaze Storage Pod hardware.
Putting The Intelligence in the Software
Another design principle for Backblaze is to anticipate that all hardware will fail and build intelligence into our cloud storage management software so that customer data is protected from hardware failure. The original Storage Pod systems provided good protection for data and Vaults continue that tradition while adding another layer of protection. In addition to leveraging our low-cost Storage Pods, Vaults take advantage of the cost advantage of consumer-grade hard drives and cleanly handle their common failure modes.
Distributing Data Across 20 Storage Pods
A Backblaze Vault is comprised of 20 Storage Pods, with the data evenly spread across all 20 pods. Each Storage Pod in a given vault has the same number of drives, and the drives are all the same size.
Drives in the same drive position in each of the 20 Storage Pods are grouped together into a storage unit we call a tome. Each file is stored in one tome and is spread out across the tome for reliability and availability.
Every file uploaded to a Vault is divided into pieces before being stored. Each of those pieces is called a shard. Parity shards are computed to add redundancy, so that a file can be fetched from a vault even if some of the pieces are not available.
Each file is stored as 20 shards: 17 data shards and three parity shards. Because those shards are distributed across 20 Storage Pods, the Vault is resilient to the failure of a Storage Pod.
Files can be written to the Vault when one pod is down and still have two parity shards to protect the data. Even in the extreme and unlikely case where three Storage Pods in a Vault lose power, the files in the vault are still available because they can be reconstructed from any of the 17 pods that are available.
Each of the drives in a Vault has a standard Linux file system, ext4, on it. This is where the shards are stored. There are fancier file systems out there, but we don’t need them for Vaults. All that is needed is a way to write files to disk and read them back. Ext4 is good at handling power failure on a single drive cleanly without losing any files. It’s also good at storing lots of files on a single drive and providing efficient access to them.
Compared to a conventional RAID, we have swapped the layers here by putting the file systems under the replication. Usually, RAID puts the file system on top of the replication, which means that a file system corruption can lose data. With the file system below the replication, a Vault can recover from a file system corruption because a single corrupt file system can lose at most one shard of each file.
Creating Flexible and Optimized Reed-Solomon Erasure Coding
Just like RAID implementations, the Vault software uses Reed-Solomon erasure coding to create the parity shards. But, unlike Linux software RAID, which offers just one or two parity blocks, our Vault software allows for an arbitrary mix of data and parity. We are currently using 17 data shards plus three parity shards, but this could be changed on new vaults in the future with a simple configuration update.
The beauty of Reed-Solomon is that we can then re-create the original file from any 17 of the shards. If one of the original data shards is unavailable, it can be re-computed from the other 16 original shards, plus one of the parity shards. Even if three of the original data shards are not available, they can be re-created from the other 17 data and parity shards. Matrix algebra is awesome!
Handling Drive Failures
The reason for distributing the data across multiple Storage Pods and using erasure coding to compute parity is to keep the data safe and available. How are different failures handled?
If a disk drive just up and dies, refusing to read or write any data, the Vault will continue to work. Data can be written to the other 19 drives in the tome, because the policy setting allows files to be written as long as there are two parity shards. All of the files that were on the dead drive are still available and can be read from the other 19 drives in the tome.
When a dead drive is replaced, the Vault software will automatically populate the new drive with the shards that should be there; they can be recomputed from the contents of the other 19 drives.
A Vault can lose up to three drives in the same tome at the same moment without losing any data, and the contents of the drives will be re-created when the drives are replaced.
Handling Data Corruption
Disk drives try hard to correctly return the data stored on them, but once in a while they return the wrong data, or are just unable to read a given sector.
Every shard stored in a Vault has a checksum, so that the software can tell if it has been corrupted. When that happens, the bad shard is recomputed from the other shards and then re-written to disk. Similarly, if a shard just can’t be read from a drive, it is recomputed and re-written.
Conventional RAID can reconstruct a drive that dies, but does not deal well with corrupted data because it doesn’t checksum the data.
Each vault is assigned a number. We carefully designed the numbering scheme to allow for a lot of vaults to be deployed, and designed the management software to handle scaling up to that level in the Backblaze data centers.
The overall design scales very well because file uploads (and downloads) go straight to a vault, without having to go through a central point that could become a bottleneck.
There is an authority server that assigns incoming files to specific Vaults. Once that assignment has been made, the client then uploads data directly to the Vault. As the data center scales out and adds more Vaults, the capacity to handle incoming traffic keeps going up. This is horizontal scaling at its best.
We could deploy a new data center with 10,000 Vaults holding 16TB drives and it could accept uploads fast enough to reach its full capacity of 160 exabytes in about two months!
Backblaze Vault Benefits
The Backblaze Vault architecture has six benefits:
1. Extremely Durable
The Vault architecture is designed for 99.999999% (eight nines) annual durability (now 11 nines — Editor). At cloud-scale, you have to assume hard drives die on a regular basis, and we replace about 10 drives every day. We have published a variety of articles sharing our hard drive failure rates.
The beauty with Vaults is that not only does the software protect against hard drive failures, it also protects against the loss of entire Storage Pods or even entire racks. A single Vault can have three Storage Pods — a full 180 hard drives — die at the exact same moment without a single byte of data being lost or even becoming unavailable.
2. Infinitely Scalable
A Backblaze Vault is comprised of 20 Storage Pods, each with 60 disk drives, for a total of 1200 drives. Depending on the size of the hard drive, each vault will hold:
12TB hard drives => 12.1 petabytes/vault (Deploying today.) 14TB hard drives => 14.2 petabytes/vault (Deploying today.) 16TB hard drives => 16.2 petabytes/vault (Small-scale testing.) 18TB hard drives => 18.2 petabytes/vault (Announced by WD & Toshiba) 20TB hard drives => 20.2 petabytes/vault (Announced by Seagate)
At our current growth rate, Backblaze deploys one to three Vaults each month. As the growth rate increases, the deployment rate will also increase. We can incrementally add more storage by adding more and more Vaults. Without changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 90 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 90 zettabytes! (Also knowWithout changing a line of code, the current implementation supports deploying 10,000 Vaults per location. That’s 160 exabytes of data in each location. The implementation also supports up to 1,000 locations, which enables storing a total of 160 zettabytes! (Also known as 160,000,000,000,000 GB.)
3. Always Available
Data backups have always been highly available: if a Storage Pod was in maintenance, the Backblaze online backup application would contact another Storage Pod to store data. Previously, however, if a Storage Pod was unavailable, some restores would pause. For large restores this was not an issue since the software would simply skip the Storage Pod that was unavailable, prepare the rest of the restore, and come back later. However, for individual file restores and remote access via the Backblaze iPhone and Android apps, it became increasingly important to have all data be highly available at all times.
The Backblaze Vault architecture enables both data backups and restores to be highly available.
With the Vault arrangement of 17 data shards plus three parity shards for each file, all of the data is available as long as 17 of the 20 Storage Pods in the Vault are available. This keeps the data available while allowing for normal maintenance and rare expected failures.
4. Highly Performant
The original Backblaze Storage Pods could individually accept 950 Mbps (megabits per second) of data for storage.
The new Vault pods have more overhead, because they must break each file into pieces, distribute the pieces across the local network to the other Storage Pods in the vault, and then write them to disk. In spite of this extra overhead, the Vault is able to achieve 1,000 Mbps of data arriving at each of the 20 pods.
This capacity required a new type of Storage Pod that could handle this volume. The net of this: a single Vault can accept a whopping 20 Gbps of data.
Because there is no central bottleneck, adding more Vaults linearly adds more bandwidth.
5. Operationally Easier
When Backblaze launched in 2008 with a single Storage Pod, many of the operational analyses (e.g. how to balance load) could be done on a simple spreadsheet and manual tasks (e.g. swapping a hard drive) could be done by a single person. As Backblaze grew to nearly 1,000 Storage Pods and over 40,000 hard drives, the systems we developed to streamline and operationalize the cloud storage became more and more advanced. However, because our system relied on Linux RAID, there were certain things we simply could not control.
With the new Vault software, we have direct access to all of the drives and can monitor their individual performance and any indications of upcoming failure. And, when those indications say that maintenance is needed, we can shut down one of the pods in the Vault without interrupting any service.
6. Astoundingly Cost Efficient
Even with all of these wonderful benefits that Backblaze Vaults provide, if they raised costs significantly, it would be nearly impossible for us to deploy them since we are committed to keeping our online backup service affordable for completely unlimited data. However, the Vault architecture is nearly cost neutral while providing all these benefits.
When we were running on Linux RAID, we used RAID6 over 15 drives: 13 data drives plus two parity. That’s 15.4% storage overhead for parity.
With Backblaze Vaults, we wanted to be able to do maintenance on one pod in a vault and still have it be fully available, both for reading and writing. And, for safety, we weren’t willing to have fewer than two parity shards for every file uploaded. Using 17 data plus three parity drives raises the storage overhead just a little bit, to 17.6%, but still gives us two parity drives even in the infrequent times when one of the pods is in maintenance. In the normal case when all 20 pods in the Vault are running, we have three parity drives, which adds even more reliability.
Backblaze’s cloud storage Vaults deliver 99.999999% (eight nines) annual durability (now 11 nines — Editor), horizontal scalability, and 20 Gbps of per-Vault performance, while being operationally efficient and extremely cost effective. Driven from the same mindset that we brought to the storage market with Backblaze Storage Pods, Backblaze Vaults continue our singular focus of building the most cost-efficient cloud storage available anywhere.
• • •
Note: This post was updated from the original version posted on March 11, 2015.
Eighteen months ago, the few remaining LTO tape drive manufacturers announced the availability of LTO-8, the latest generation of the Linear Tape-Open storage technology. Yet today, almost no one is actually writing data to LTO-8 tapes. It’s not that people aren’t interested in upgrading to the denser LTO-8 format that offers 12 TB per cartridge, twice LTO-7’s six TB capacity. It’s simply that the two remaining LTO tape manufacturers are locked in a patent infringement battle. And that means LTO-8 tapes are off the market indefinitely.
The pain of this delay is most acute for media professionals who are always quick to adopt higher capacity storage media for video and audio files that are notorious storage hogs. As cameras get more sophisticated, capturing in higher resolutions and higher frame rates, the storage capacity required per hour of content shoots through the roof. For example, one hour of ProRes UltraHD requires 148.72 GB storage capacity, which is four times more than the 37.35 GB required for one hour of ProRes HD-1080. Meanwhile, falling camera prices are encouraging production teams to use more cameras per shoot, further increasing the capacity requirements.
Since its founding, the LTO Consortium has prepared for storage growth by setting a goal of doubling tape density with each LTO generation and committed to releasing a new generation every two to three years. While this lofty goal might seem admirable to the LTO Consortium, it puts customers with earlier generations of LTO systems in a difficult position. New generation LTO drives at best can only read tapes from the two previous generations. So once a new generation is announced, the clock begins ticking on data stored on deprecated generations of tapes. Until you migrate the data to a newer generation, you’re stuck maintaining older tape drive hardware that may be no longer supported by manufacturers.
How Manufacturer Lawsuits Led to the LTO-8 Shortage
How the industry and the market arrived in this painful place is a tangled tale. The lawsuit and counter-lawsuit that led to the LTO-8 shortage is a patent infringement dispute between Fuji and Sony, the only two remaining manufacturers of LTO tape media. The timeline is complicated, starting in 2016 with Fujifilm suing Sony, then Sony counter-suing Fuji. By March 2019, US import bans of LTO products of both manufacturers were in place.
In the middle of these legal battles, LTO-8 drive manufacturers announced product availability in late 2017. But what about the LTO-8 tapes? Fujifilm says they’re not currently manufacturing LTO-8 and have never sold them. And Sony says its US imports of LTO-8 have been stopped and won’t comment about when they will begin shipping again per the dispute. So no LTO-8 for you!
Note that having only two LTO tape manufacturers is a root cause of this shortage. If there were still six LTO tape manufacturers like there were when LTO was launched in 2000, a dispute between two vendors might not have left the market in the lurch.
Weighing Your Options — LTO-8 Shortage Survival Strategies
If you’re currently using LTO for backup or archive, you have a few options for weathering the LTO-8 shortage.
The first option is to keep using your current LTO generation and wait until the disputes settle out completely before upgrading to LTO-8. The downside here is you’ll have to buy more and more LTO-7 or LTO-6 tapes that don’t offer the capacity you probably need if you’re storing higher resolution video or other capacity-hogging formats. And while you’ll be spending more on tapes than if you were able to use the higher capacity newer generation tapes, you’ll also know that anything you write to old-gen LTO tapes will have to be migrated sooner than planned. LTO’s short two to three year generation cycle doesn’t leave time for legal battles, and remember, manufacturers guarantee at most two generations of backward compatibility.
A second option is to go ahead and buy an LTO-8 library and use LTO-7 tapes that have been specially formatted for higher capacity called LTO Type M (M8). When initialized as Type M media, LTO-7 can hold nine TB of data instead of the standard six TB LTO-7 cartridge initialized as Type A. That puts it halfway up to the 12 TB capacity of an LTO-8 tape. However, this extra capacity comes with several caveats:
Only new, unused LTO-7 cartridges can be initialized as Type M.
Once initialized as Type M, they cannot be changed back to LTO-7 Type A.
Only LTO-8 drives in libraries can read and write to Type M, not standalone drives.
Future LTO generations — LTO-9, LTO-10, etc. — will not be able to read LTO-7 Type M.
So if you go with LTO-7 Type M for greater capacity, realize it’s still LTO-7, not LTO-8, and when you move to LTO-9, you won’t be able to read those tapes.
Managing Tape is Complicated
If your brain hurts reading this as much as mine does writing this, it’s because managing tape is complicated. The devil is in the details, and it’s hard to keep them all straight. When you have years or even decades of content stored on LTO tape, you have to keep track of which content is on which generation of LTO, and ensure your facility has the drive hardware available to read them, and hope that nothing goes wrong with the tape media or the tape drives or libraries.
In general, new drives can read two generations back, but there are exceptions. For example, LTO-8 can’t read LTO-6 because the standard changed from GMR (Giant Magneto-Resistance) heads to TMR (Tunnel Magnetoresistance Recording) heads. The new TMR heads can write data more densely, which is what drives the huge increase in capacity. But that means you’ll want to keep an LTO-7 drive available to read LTO-5 and LTO-6 tapes.
Beyond these considerations for managing the tape storage long-term, there are the day-to-day hassles. If you’ve ever been personally responsible for managing backup and archive for your facility, you’ll know that it’s a labor-intensive, never-ending chore that takes time from your real job. And if your setup doesn’t allow users to retrieve data themselves, you’re effectively on-call to pull data off the tapes whenever it’s needed.
A Third Option — Migrate from LTO to Cloud Storage
If neither of these options to the LTO-8 crisis sounds appealing, there is an alternative: cloud storage. Cloud storage removes the complexity of tape while reducing costs. How much can you save in media and labor costs? We’ve calculated it for you in LTO Versus Cloud Storage Costs — the Math Revealed. And cloud storage makes it easy to give users access to files, either through direct access to the cloud bucket or through one of the integrated applications offered by our technology partners.
At Backblaze, we have a growing number of customers who shifted from tape to our B2 Cloud Storage and never looked back. Customers such as Austin City Limits, who preserved decades of concert historical footage by moving to B2; Fellowship Church, who eliminated Backup Thursdays and freed up staff for other tasks; and American Public Television, who adopted B2 in order to move away from tape distribution to its subscribers. What they’ve found is that B2 made operations simpler and their data more accessible without breaking their budget.
Another consideration: once you migrate your data to B2 cloud storage, you’ll never have to migrate again when LTO generations change or when the media ages. Backblaze takes care of making sure your data is safe and accessible on object storage, and migrates your data to newer disk technologies over time with no disruption to you or your users.
In the end, the problem with tape isn’t the media, it’s the complexity of managing it. It’s a well-known maxim that the time you spend managing how you do your work takes time away from what you do. Having to deal with multiple generations of both tape and tape drives is a good example of an overly complex system. With B2 Cloud Storage, you can get all the economical advantages of tape as well as the disaster recovery advantages of your data being stored away from your facility, without the complexity and the hassles.
With no end in sight to this LTO-8 shortage, now is a good time to make the move from LTO to B2. If you’re ready to start your move to alway available cloud storage, Backblaze and our partners are ready to help you.
Migrate or Die, a Webinar Series on Migrating Assets and Archives to the Cloud
Three examples of common media workflows using a NAS
Top five benefits of using NAS for photography and videography
The camera might be firmly entrenched at the top of the list of essential equipment for photographers and videographers, but a strong contender for next on the list has to be network-attached storage (NAS).
A big reason for the popularity of NAS is that it’s one device that can do so many things that are needed in a media management workflow. Most importantly, NAS systems offer storage larger than any single hard drive, let you centralize photo storage, protect your files with backups and data storage virtualization (e.g. RAID), allow you to access files from anywhere, integrate with many media editing apps, and securely share media with coworkers and clients. And that’s just the beginning of the wide range of capabilities of NAS. It’s not surprising that NAS has become a standard and powerful data management hub serving the media professional.
This post is an overview of how NAS can fit into the professional or serious amateur photo and video workflow and some of the benefits you can receive from adding a NAS.
Essential NAS Capabilities
Firstly, NAS is a data storage device. It connects to your computer, office, and the internet, and supports loading and retrieving data from multiple computers in both local and remote locations.
The number of drives available for data storage is determined by how many bays the NAS has. As larger and faster disk drives become available, a NAS can be upgraded with larger drives to increase capacity, or multiple NAS can be used together. Solid-state drives (SSDs) can be used in a NAS for primary storage or as a cache to speed up data access.
Data Protection and Redundancy
NAS can be used for either primary or secondary local data storage. Whichever it is, it’s important to have an off-site backup of that data, as well, to provide redundancy in case of accident, or in the event of a hardware or software problem. That off-site backup can be drives stored in another location, or more commonly these days, the cloud. The most popular NAS systems typically offer built-in tools to automatically sync files on your NAS to offsite cloud storage, and many also have app stores with backup and many other types of applications, as well.
Data is typically stored on the NAS using some form of error checking and virtual storage system, typically RAID 5 or RAID 6, to keep your data available even if one of the internal hard drives fail. However, if NAS is the only backup you have, and a drive fails, it can take quite a while to recover that data from a RAID device, and the delay only gets longer as drives increase in size. Avoiding this delay is the motivation for many to keep a redundant copy in the cloud so that it’s possible to access the files immediately even before the RAID has completed its recovery.
If your primary data files are on an editing workstation, the NAS can be your local backup to make sure you keep your originals safe from accidental changes or loss. In some common editing workflows, the raw files are stored on the NAS and lower-resolution, smaller proxies are used for offline editing on the workstation — also called non-destructive or non-linear editing. Once edits are completed, the changes are written back to the NAS. Some applications, including Lightroom, maintain a catalog of files that is separate from the working files and is stored on the editing workstation. This catalog should be routinely backed up locally and remotely to protect it, as well.
The data on the NAS also can be protected with automated data backups or snapshots that protect data in case of loss, or to retrieve an earlier version of a file. A particularly effective plan is to schedule off-hours backups to the cloud to complete the off-site component of the recommended 3-2-1 backup strategy.
Data Accessibility and Sharing
Data can be loaded onto the NAS directly through a USB or SD card slot, if available, or through any device available via the local network or internet. Another possibility is to have a directory/folder on a local computer that automatically syncs any files dropped there to the NAS.
Once on the NAS, files can be shared with coworkers, clients, family, and friends. The NAS can be accessed via the internet from anywhere, so you can easily share work in progress or final media presentations. Access can be configured by file, directory/folder, group, or by settings in the particular application you are using. NAS can be set up with a different user and permission structure than your computer(s), making it easy to grant access to particular folders, and keeping the security separate from however local computers are set up. With proper credentials, a wide range of mobile apps or a web browser can be used to access the data on the NAS.
Media Editing Integration
It’s common for those using applications such as Adobe Lightroom to keep the original media on the NAS and work on a proxy on the local computer. This speeds up the workflow and protects the original media files. Similarly, for video, some devices are fast enough to support NLE (non-linear editing), and therefore support using the NAS for source and production media but allow editing without changing the source files. Popular apps that support NLE include Adobe Premiere, Apple Final Cut Pro X, and Avid Media Composer.
Flexibility and Apps
NAS from Synology, QNAP, FreeNAS/TrueNAS, Morro Bay, and others offer a wide range of apps that extend the functionality of the device. You can easily turn a NAS into a media server that streams audio and video content to TVs and other devices on your network. You can set up a NAS to automatically perform backups of your computers, or configure that NAS as a file server, a web server, or even a telephone system. Some home offices and small businesses have even completely replaced office servers with NAS.
Examples of Common Media Workflows Using a NAS
The following are three examples of how a NAS device can fit into a media production workflow.
Example One — A Home Studio
NAS is a great choice for a home studio that needs additional data storage, file sharing, cloud backup, and secure remote access. NAS is a better choice than using directly-attached storage because it can have separate security than local computers and is accessible both locally and via the internet even when individual workstations might be turned off or disconnected.
NAS can provide centralized backup using common backup apps, including Time Machine and ChronoSync on Mac, or Backup and Restore and File History on Windows.
To back up to the cloud, major NAS providers, including Synology, QNAP, Morro Data, and FreeNAS/TrueNAS include apps that can automatically back up NAS data to B2 or other destinations on the schedule of your choice.
Example Two — A Distributed Media Company with Remote Staff
The connectivity of NAS makes it an ideal hub for a distributed business. It provides a central location for files that can be reliably protected with RAID, backups, and access security, yet available to any authorized staff person no matter where they are located. Professional presentations are easy to do with a range of apps and integrations available for NAS. Clients can be given controlled access to review drafts and final proofs, as well.
Example Three — Using NAS with Photo/Video Editing Applications
Many media pros have turned to NAS for storing their ever-growing photos and video data files. Frequently, these users will optimize their workstation for the editing or cataloging application of their choice using fast central and graphics processors, SSD drives, and large amounts of RAM, and offload the data files to the NAS.
While Adobe Lightroom requires that its catalog be kept on a local or attached drive, the working files can be stored elsewhere. Some users have adopted the digital negative (DNG) for working files, which avoids having to manage sidecar (XMP) files. XMP files are stored alongside the RAW files and record edits for file formats that don’t support saving that information natively, such as proprietary camera RAW files, including CRW, CR2, NEF, ORF, and so on.
With the right software and hardware, NAS also can play well in a shared video editing environment, enabling centralized storage of data with controlled access, file security, and supporting other functions such as video transcoding.
Top 5 Benefits of Using NAS for Photography and Videography
To recap, here are the top five benefits of adding NAS to your media workflow.
Flexible and expandable storage — fast, expandable and grows with your needs
Data protection — provides local file redundancy as well as an automated backup gateway to the cloud
Data accessibility and sharing — functions as a central media hub with internet connectivity and access control
Integration with media editing tools — works with editing and cataloging apps for photo and video
Flexibility and apps — NAS can perform many of the tasks once reserved for servers, with a wide range of apps to extend its capabilities
To learn more about what NAS can do for you, take a look at the posts on our blog on specific NAS devices from Synology, QNAP, FreeNAS/TrueNAS, and Morro Data, and about how to use NAS for photo and video storage. You’ll also find more information about how to connect NAS to the cloud. You can quickly find all posts on the NAS topic on our blog by following the NAS tag.
Do you have experience using NAS in a photo or video workflow? We’d love to hear about your experiences in the comments.
Apple’s annual WWDC is highlighting high-end desktop computing, but it’s laptop computers and the cloud that are driving a new wave of business and creative collaboration
WWDC, Apple’s annual megaconference for developers kicks off this week, and Backblaze has team members on the ground to bring home insights and developments. Yet while everyone is drooling over the powerful new Mac Pro, we know that the majority of business users use a portable computer as their primary system for business and creative use.
The Rise of the Mobile, Always On, Portable Workstation
After all, these systems are extremely popular with users and the DevOps and IT teams that support them. Small and self-contained, with massive compute power, modern laptops have fast SSD drives and always-connected Wi-Fi, helping users be productive anywhere: in the field, on business trips, and at home. Surprisingly, companies today can deploy massive fleets of these notebooks with extremely lean staff. At the inaugural MacDevOps conference a few years ago Google’s team shared that they managed 65,000 Macs with a team of seven admins!
<BOGGLE> Compare this to 99% of the clay feet ‘enterprise’ companies inflicting 90’s IT on their workforce. https://t.co/plHQSTRPLa
With the trend towards leaner IT staffs, and the dangers of computers in the field being lost, dropped or damaged, having a reliable backup system that just works is critical. Despite the proliferation of teams using shared cloud documents and email, all of the other files on your laptop you’re working on — the massive presentation due next week or the project that’s not quite ready to share on Google Drive — all have no protection without backup, which is of course why Backblaze exists!
Cloud as a Shared Business Content Hub is Changing Everything
When a company is backing up users’ files comfortably to the cloud, the next natural step is to adopt cloud-based storage like Backblaze B2 for your teams. With over 750 petabytes of customer data under management, Backblaze has worked with businesses of every size as they adopt cloud storage. Each customer and business does so for different reasons.
In the past, a business department typically would get a share of a company’s NAS server and was asked to keep all of the department’s shared documents there. But outside the corporate firewall, it turns out these systems are hard to access remotely from the road. They require VPNs and a constant network connection to mount a corporate shared drive via SMB or NFS. And, of course, running out of space and storing large files was an ever present problem.
Sharing Business Content in the Cloud Can be Transformational for Businesses
When considering a move to cloud-based storage for your team, some benefits seem obvious, but others are more profound and show that cloud storage is emerging as a powerful, organizing platform for team collaboration.
Shifting to cloud storage delivers these well-known benefits:
Pay only for storage you actually need
Grow as large and as quickly as you might need
Service, management, and upgrades are built in to the service
Pay for service as you use it out of operating expenses vs. onerous capital expenses
But shifting to shared, cloud storage yields even more profound benefits:
Your Business Content is Easier to Organize and Manage: When your team’s content is in one place, it’s easier to organize and manage, and users can finally let go of stashing content all over your organization or leaving it on their laptops. All of your tools to mine and uncover your business’s content work more efficiently, and your users do as well.
You Get Simple Workflow Management Tools for Free: Storage can fit your business processes much easier with cloud storage and do it on the fly. If you ever need to set up separate storage for teams of users, or define read/write rules for specific buckets of content, it’s easy to configure with cloud storage.
You Can Replace External File-Sharing Tools: Since most email services balk at sending large files, it’s common to use a file sharing service to share big files with other users on your team or outside your organization. Typically this means having to download a massive file, re-upload it to a file-sharing service, and publish that file-sharing link. When your files are already in cloud, sharing it is as simple as retrieving a URL location.
In fact, this is exactly how Backblaze organizes and serves PDF content on our website like customer case studies. When you click on a PDF link on the Backblaze website, it’s served directly from one of these links from a B2 bucket!
You Get Instant, Simple Policy Control over Your Business or Shared Content: B2 offers simple-to-use tools to keep every version of a file as it’s created, keep just the most recent version, or choose how many versions you require. Want to have your shared content links time-out after a day or so? This and more is all easily done from your B2 account page:
You’re One Step Away from Sharing That Content Globally: As you can see, beyond individual file-sharing, cloud storage like Backblaze B2 can serve as your origin store for your entire website. With the emergence of content delivery networks (CDN), you’re now only a step away from sharing and serving your content globally.
Get Sophisticated Content Discovery and Compliance Tools for Your Business Content: With more and more business content in cloud storage, finding the content you need quickly across millions of files, or surfacing content that needs special storage consideration (for GDPR or HIPAA compliance, for example) is critical.
Ideally, you could have your own private, customized search engine across all of your cloud content, and that’s exactly what a new class of solutions provide.
With Acembly or Aparavi on Backblaze, you can build content indexes and offer deep search across all of your content, and automatically apply policy rules for management and retention.
Where Are You in the Cloud Collaboration Trend?
The trend to mobile, always-on workers building and sharing ever more sophisticated content around cloud storage as a shared hub is only accelerating. Users love the freedom to create, collaborate and share content anywhere. Businesses love the benefits of having all of that content in an easily managed repository that makes their entire business more flexible and less expensive to operate.
So, while device manufacturers like Apple may announce exciting Pro level workstations, the need for companies and teams to collaborate and be effective on the move is an even more important and compelling issue than ever before. The cloud is an essential element of that trend that can’t be underestimated.
Many of us would concede that buildings housing data centers are generally pretty ordinary places. They’re often drab and bunker-like with few or no windows, and located in office parks or in rural areas. You usually don’t see signs out front announcing what they are, and, if you’re not in information technology, you might be hard pressed to guess what goes on inside.
If you’re observant, you might notice cooling towers for air conditioning and signs of heavy electrical usage as clues to their purpose. For most people, though, data centers go by unnoticed and out of mind. Data center managers like it that way, because the data stored in and passing through these data centers is the life’s blood of business, research, finance, and our modern, digital-based lives.
That’s why the exceptions to low-key and meh data centers are noteworthy. These unusual centers stand out for their design, their location, what the building was previously used for, or perhaps how they approach energy usage or cooling.
Let’s take a look at a handful of data centers that certainly are outside of the norm.
The Underwater Data Center
Microsoft’s rationale for putting a data center underwater makes sense. Most people live near water, they say, and their submersible data center is quick to deploy, and can take advantage of hydrokinetic energy for power and natural cooling.
Project Natick has produced an experimental, shipping-container-size prototype designed to process data workloads on the seafloor near Scotland’s Orkney Islands. It’s part of a years-long research effort to investigate manufacturing and operating environmentally sustainable, prepackaged datacenter units that can be ordered to size, rapidly deployed, and left to operate independently on the seafloor for years.
The Supercomputing Center in a Former Catholic Church
One might be forgiven for mistaking Torre Girona for any normal church, but this deconsecrated 20th century church currently houses the Barcelona Supercomputing Center, home of the MareNostrum supercomputer. As part of the Polytechnic University of Catalonia, this supercomputer (Latin for Our sea, the Roman name for the Mediterranean Sea), is used for a range of research projects, from climate change to cancer research, biomedicine, weather forecasting, and fusion energy simulations.
The Under-a-Mountain Bond Supervillain Data Center
Most data centers don’t have the extreme protection or history of the The Bahnhof Data Center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described Bond villain ambiance.
The Data Center That Can Survive a Class 5 Hurricane
Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category five hurricane winds.
The MI1 facility provides access for the Caribbean, South and Central America to “to more than 148 countries worldwide,” and is the primary network exchange between Latin America and the U.S., according to Equinix. Any outage in this data center could potentially cripple businesses passing information between these locations.
The center was put to the test in 2017 when Hurricane Irma, a class 5 hurricane in the Caribbean, made landfall in Florida as a class 4 hurricane. The storm caused extensive damage in Miami-Dade County, but the Equinix center survived.
The Data Center Cooled by Glacier Water
Located on Norway’s west coast, the Lefdal Mine Datacenter is built 150 meters into a mountain in what was formerly an underground mine for excavating olivine, also known as the gemstone peridot, a green, high- density mineral used in steel production. The data center is powered exclusively by renewable energy produced locally, while being cooled by water from the second largest fjord in Norway, which is 565 meters deep and fed by the water from four glaciers. As it’s in a mine, the data center is located below sea level, eliminating the need for expensive high-capacity pumps to lift the fjord’s water to the cooling system’s heat exchangers, contributing to the center’s power efficiency.
The World’s Largest Data Center
The Tahoe Reno 1 data center in The Citadel Campus in Northern Nevada, with 7.2 million square feet of data center space, is the world’s largest data center. It’s not only big, it’s powered by 100% renewable energy with up to 650 megawatts of power.
An Out of This World Data Center
If the cloud isn’t far enough above us to satisfy your data needs, Cloud Constellation Corporation plans to put your data into orbit. A constellation of eight low earth orbit satellites (LEO), called SpaceBelt, will offer up to five petabytes of space-based secure data storage and services and will use laser communication links between the satellites to transmit data between different locations on Earth.
CCC isn’t the only player talking about space-based data centers, but it is the only one so far with 100 million in funding to make their plan a reality.
A Cloud Storage Company’s Modest Beginnings
OK, so our current data centers are not that unusual (with the possible exception of our now iconic Storage Pod design), but Backblaze wasn’t always the profitable and growing cloud services company that it is today. hen Backblaze was just getting started and was figuring out how to make data storage work while keeping costs as low as possible for our customers.There was a time when Backblaze was just getting started, and before we had almost an exabyte of customer data storage, that we were figuring out how to make data storage work while keeping costs as low as possible for our customers.
The photo below is not exactly a data center, but it is the first data storage structure used by Backblaze to develop its storage infrastructure before going live with customer data. It was on the patio behind the Palo Alto apartment that Backblaze used for its first office.
The photos below (front and back) are of the very first data center cabinet that Backblaze filled with customer data. This was in 2009 in San Francisco, and just before we moved to a data center in Oakland where there was room to grow. Note the storage pod at the top of the cabinet. Yes, it’s made out of wood. (You have to start somewhere.)
Do You Know of Other Unusual Data Centers?
Do you know of another data center that should be on this list? Please tell us in the comments.
Since introducing B2 Cloud Storage nearly four years ago, we’ve been busy adding enhancements and new functionality to the service. We continually look for ways to make B2 more useful for our customers, be it through service level enhancements, partnerships with leading Compute providers, or lowering the industry’s lowest download price to 1¢/GB. Today, we’re pleased to announce the beta release of our newest functionality: Copy File.
What You Can Do With B2 Copy File
This new capability enables you to create a new file (or new part of a large file) that is a copy of an existing file (or range of an existing file). You can either copy over the source file’s metadata or specify new metadata for the new file that is created. This all occurs without having to download or reupload any data.
This has been one of our most requested features, as it unlocks:
Rename/Re-organize. The new capabilities give customers the ability to reorganize their files without having to download and reupload. This is especially helpful when trying to mirror the contents of a file system to B2.
Synthetic Backup. With the ability to copy ranges of a file, users can now leverage B2 for synthetic backup, which is uploading a full backup but then only uploading incremental changes (as opposed to reuploading the whole file with every change). This is particularly helpful for uses such as backing up VMs, where reuploading the entirety of the file every time it changes creates user inefficiencies.
We’re introducing these endpoints as a beta so that developers can provide us feedback before the endpoints go into production. Specifically, this means that the APIs may evolve as a result of the feedback we get. We encourage you to give Copy File a try and, if you have any comments, you can email our B2 beta team at firstname.lastname@example.org. Thanks!
VM Backup to B2 Using Veeam Backup & Replication and Morro Data CloudNAS
We are glad to show how Veeam Backup & Replication can work with Morro Data CloudNAS to keep the more recent backups on premises for fast recovery while archiving all backups in B2 Cloud Storage. CloudNAS not only caches the more recent backup files, but also simplifies the management of B2 Cloud Storage with a network share or drive letter interface.
–Paul Tien, Founder & CEO, Morro Data
VM backup and recovery is a critical part of IT operations that supports business continuity. Traditionally, IT has deployed an array of purpose-built backup appliances and applications to protect against server, infrastructure, and security failures. As VMs continue to spread in production, development, and verification environments, the expanding VM backup repository has become a major challenge for system administrators.
Because the VM backup footprint is usually quite large, cloud storage is increasingly being deployed for VM backup. However, cloud storage does not achieve the same performance level as on-premises storage for recovery operation. For this reason, cloud storage has been used as tiered repository behind on-premises storage.
In this best practice guide, VM Backup to B2 Using Veeam Backup & Replication and Morro Data CloudNAS, we will show how Veeam Backup & Replication can work with Morro Data CloudNAS to keep the most recent backups on premises for fast recovery while archiving all backups in the retention window in Backblaze B2 cloud storage. CloudNAS caching not only provides buffer for most recent backup files, but also simplifies the management of on-premises storage and cloud storage as an integral backup repository.
Tell Us How You’re Backing Up Your VMs
If you’re backing up VMs to B2 using one of the solutions we’ve written about in this series, we’d like to hear from you in the comments about how it’s going.
Like many Backblaze customers, Nodecraft realized they could save a fortune by shifting their cloud storage to Backblaze and invest it elsewhere in growing their business. In this post that originally appeared on Nodecraft’s blog, Gregory R. Sudderth, Nodecraft’s Senior DevOps Engineer, shares the steps they took to first analyze, test, and then move that storage. — Skip Levens
TL;DR: Nodecraft moved 23TB of customer backup files from AWS S3 to Backblaze B2 in just 7 hours.
Nodecraft.com is a multiplayer cloud platform, where gamers can rent and use our servers to build and share unique online multiplayer servers with their friends and/or the public. In the course of server owners running their game servers, there are backups generated including the servers’ files, game backups and other files. It goes without saying that backup reliability is important for server owners.
In November 2018, it became clear to us at Nodecraft that we could improve our costs if we re-examine our cloud backup strategy. After looking at the current offerings, we decided we were moving our backups from Amazon’s S3 to Backblaze’s B2 service. This article describes how our team approached it, why, and what happened, specifically so we could share our experiences.
Due to S3 and B2 being at least nearly equally* accessible, reliable, available, as well as many other providers, our primary reason for moving our backups now became pricing. As we started into the effort, other factors such as variety of API, quality of API, real-life workability, and customer service started to surface.
After looking at a wide variety of considerations, we decided on Backblaze’s B2 service. A big part of the costs of this operation is their bandwidth, which is amazingly affordable.
The price gap between the two object storage systems come from the Bandwidth Alliance between Backblaze and Cloudflare, a group of providers that have agreed to not charge (or heavily discount) for data leaving inside the alliance of networks (“egress” charges). We at Nodecraft use Cloudflare extensively and so this left only the egress charges from Amazon to Cloudflare to worry about.
In normal operations, our customers both constantly make backups as well as access them for various purposes and there has been no change to their abilities to perform these operations compared to the previous provider.
As with any change in providers, the change-over must be thought out with great attention to detail. When there were no quality issues previously and circumstances are such that a wide field of new providers can be considered, the final selection must be carefully evaluated. Our list of concerns included these:
Safety: we needed to move our files and ensure they remain intact, in a redundant way
Availability: the service must both be reliable but also widely available ** (which means we needed to “point” at the right file after its move, during the entire process of moving all the files: different companies have different strategies, one bucket, many buckets, regions, zones, etc)
API: we are experienced, so we are not crazy about proprietary file transfer tools
Speed: we needed to move the files in bulk and not brake on rate limitations, and…
…improper tuning could turn the operation into our own DDoS.
All these factors individually are good and important, but when crafted together, can be a significant service disruption. If things can move easily, quickly, and, reliably, improper tuning could turn the operation into our own DDoS. We took thorough steps to make sure this wouldn’t happen, so an additional requirement was added:
Tuning: Don’t down your own services, or harm your neighbors
What this means to the lay person is “We have a lot of devices in our network, we can do this in parallel. If we do it at full-speed, we can make our multiple service providers not like us too much… maybe we should make this go at less than full speed.”
To embrace our own cloud processing capabilities, we knew we would have to take a two tier approach in both the Tactical (move a file) and Strategic (tell many nodes to move all the files) levels.
Our goals here are simple: we want to move all the files, move them correctly, and only once, but also make sure operations can continue while the move happens. This is key because if we had used one computer to move the files, it would take months.
The first step to making this work in parallel was to build a small web service to allow us to queue a single target file to be moved at a time to each worker node. This service provided a locking mechanism so that the same file wouldn’t be moved twice, both concurrently or eventually. The timer for the lock to expire (with error message) was set to a couple hours. This service was intended to be accessed via simple tools such as curl.
We deployed each worker node as a Docker container, spread across our Docker Swarm. Using the parameters in a docker stack file, we were able to define how many workers per node joined the task. This also ensured more expensive bandwidth regions like Asia Pacific didn’t join the worker pool.
Nodecraft has multiple fleets of servers spanning multiple datacenters, and our plan was to use spare capacity on most of them to move the backup files. We have experienced a consistent pattern of access of our servers by our users in the various data centers across the world, and we knew there would be availability for our file moving purposes.
Our goals in this part of the operation are also simple, but have more steps:
Get the name/ID/URL of a file to move which…
locks the file, and…
starts the fail timer
Get the file info, including size
DOWNLOAD: Copy the file to the local node (without limiting the node’s network availability)
Verify the file (size, ZIP integrity, hash)
UPLOAD: Copy the file to the new service (again without impacting the node)
Report “done” with new ID/URL location information to the Strategic level, which…
…releases the lock in the web service, cancels the timer, and marks the file DONE
The Kill Switch
In the case of a potential run-away, where even the in-band Docker Swarm commands themselves, we decided to make sure we had a kill switch handy. In our case, it was our intrepid little web service–we made sure we could pause it. Looking back, it would be better if it used a consumable resource, such as a counter, or a value in a database cell. If we didn’t refresh the counter, then it would stop all its own. More on “runaways” later.
Real Life Tuning
Our business has daily, weekly, and other cycles of activity that are predictable. Most important is our daily cycle, that trails after the Sun. We decided to use our nodes that were in low-activity areas to carry the work, and after testing, we found that if we tune correctly this doesn’t affect the relatively light loads of the servers in that low-activity region. This was backed up by verifying no change in customer service load using our metrics and those of our CRM tools. Back to tuning.
Initially we tuned the DOWN file transfer speed equivalent to 3/4ths of what wget(1) could do. We thought “oh, the network traffic to the node will fit in-between this so it’s ok”. This is mostly true, but only mostly. This is a problem in two ways. The cause of the problems is that isolated node tests are just that—isolated. When a large number of nodes in a datacenter are doing the actual production file transfers, there is a proportional impact that builds as the traffic is concentrated towards the egress point(s).
Problem 1: you are being a bad neighbor on the way to the egress points. Ok, you say “well we pay for network access, let’s use it” but of course there’s only so much to go around, but also obviously “all the ports of the switch have more bandwidth than the uplink ports” so of course there will be limits to be hit.
Problem 2: you are being your own bad neighbor to yourself. Again, if you end-up with your machines being network-near to each other in a network-coordinates kind of way, your attempts to “use all that bandwidth we paid for” will be throttled by the closest choke point, impacting only or nearly only yourself. If you’re going to use most of the bandwidth you CAN use, you might as well be mindful of it and choose where you will put the chokepoint, that the entire operation will create. If one is not cognizant of this concern, one can take down entire racks of your own equipment by choking the top-of-rack switch, or, other networking.
By reducing our 3/4ths-of-wget(1) tuning to 50% of what wget could do for a single file transfer, we saw our nodes still functioning properly. Your mileage will absolutely vary, and there’s hidden concerns in the details of how your nodes might or might not be near each other, and their impact on hardware in between them and the Internet.
Perhaps this is an annoying detail: Based on previous experience in life, I put in some delays. We scripted these tools up in Python, with a Bourne shell wrapper to detect fails (there were) and also because for our upload step, we ended up going against our DNA and used the Backblaze upload utility. By the way, it is multi-threaded and really fast. But in the wrapping shell script, as a matter of course, in the main loop, that was first talking to our API, I put in a sleep 2 statement. This creates a small pause “at the top” between files.
This ended up being key, as we’ll see in a moment.
How It (The Service, Almost) All Went Down
What’s past is sometimes not prologue. Independent testing in a single node, or even a few nodes, was not totally instructive to what really was going to happen as we throttled up the test. Now when I say “test” I really mean, “operation”.
Our initial testing was concluded “Tactically” as above, for which we used test files, and were very careful in the verification thereof. In general, we were sure that we could manage copying a file down (Python loop) and verifying (unzip -T) and operate the Backblaze b2 utility without getting into too much trouble…but it’s the Strategic level that taught us a few things.
Remembering to a foggy past where “6% collisions on a 10-BASE-T network and its game over”…yeah that 6%. We throttled up the number of replicas in the Docker Swarm, and didn’t have any problems. Good. “Alright.” Then we moved the throttle so to speak, to the last detent.
We had nearly achieved self-DDoS.
It wasn’t all that bad, but, we were suddenly very, very happy with our 50%-of-wget(1) tuning, and our 2 second delays between transfers, and most of all, our kill switch.
TL;DR — Things went great.
There were a couple files that just didn’t want to transfer (weren’t really there on S3, hmm). There were some DDoS alarms that tripped momentarily. There was a LOT of traffic…and, then, the bandwidth bill.
Your mileage may vary, but there’s some things to think about with regards to your bandwidth bill. When I say “bill” it’s actually a few bills.
As per the diagram above, moving the file can trigger multiple bandwidth charges, especially as our customers began to download the files from B2 for instance deployment, etc. In our case, we now only had the S3 egress bill to worry about. Here’s why that works out:
We have group (node) discount bandwidth agreements with our providers
B2 is a member of the Bandwidth Alliance…
…and so is Cloudflare
We were accessing our S3 content through our (not free!) Cloudflare account public URLs, not by the (private) S3 URLs.
Without saying anything about our confidential arrangements with our service partners, the following are both generally true: you can talk to providers and sometimes work out reductions. Also, they especially like it when you call them (in advance) and discuss your plans to run their gear hard. For example, on another data move, one of the providers gave us a way to “mark” our traffic a certain way, and it would go through a quiet-but-not-often-traveled part of their network; win win!
Server and simple are words rarely seen together. However, we get consistent feedback from customers on the effectiveness of CloudBerry in removing challenges from backing up the most complex environments. Customers that switch to B2 & CloudBerry already realize savings of up to 75% over comparable backup solutions, and our joint solution has helped thousands of customers get their data offsite affordably.
As anybody who has had to wrangle overflowing servers knows, getting your data backed up to the cloud is important. Yet, customers with large datasets face their own challenges with the migration of their data, including upload times, bandwidth management, and more. Today, CloudBerry has released support for Backblaze’s B2 Fireball, and in doing so, the process of getting your servers backed up offsite has become even more efficient.
The B2 Fireball is an import service for securely migrating large data sets from your on-premises environments into B2 Cloud Storage. If you have 70 TB of server data that you are looking to get offsite, the initial upload can take over two months on a dedicated 100 Mbps line. This lengthy process doesn’t even factor in incremental uploads. By adding this frequently requested functionality to our joint solution, we are able to help customers back up their servers quicker, with even greater savings.
Using the B2 Fireball alongside CloudBerry, customers now have an affordable and viable method of backing up large datasets with no strain on their network. For a customer with 70 TB of server data, using the B2 Fireball with CloudBerry and B2 can get your data offsite and secure in 1/10th of the time of uploading over a dedicated line.
When using CloudBerry with the B2 Fireball, there is no need to increase your bandwidth or saturate your connection. To get started with the Backblaze Fireball, we will send a Fireball to wherever you are keeping your data. The Fireball is equipped with 1 Gbps connectivity. Using the CloudBerry client on your servers, you simply point your backup destination at the Fireball. Once the data is uploaded, send the Fireball back to Backblaze and your archive gets uploaded inside our secure data center (no bandwidth used on your network). After the Fireball upload is complete, you can point your CloudBerry client at your B2 bucket and CloudBerry seamlessly syncs everything. You can then continue to upload incremental data to Backblaze B2 from a single server or multiple ones, all managed through a single pane of glass. Best of all, the entire process is available fully on-demand. To get started, visit our Fireball webpage to order your Fireball.
For many of our customers, this solution has given them the ability to move their server backups offsite. Besides being simple and generating significant savings, here are the other key benefits customers get from the CloudBerry & B2 solution:
Automated backups. Increase efficiency by automating backups to avoid data loss.
Web-based admin console. Set backup plans once, deploy across multiple servers, and manage those servers, all from a single location.
Security. Client side encryption and the ability to set a private key secure your server data and offers protection from hackers.
Versioning. Set a minimum number of file versions to protect yourself from ransomware and avoid getting stuck depending on a single version of an important file.
Retention. Set retention policies to comply with company policies or regulatory agencies.
File level backups. These are typically faster and easier to execute when it comes to restoring a single file.
Native application support. Back up an Exchange and SQL Server appropriately.
Reliability. Be sure that your data is secure with notifications of failing backups.
Regardless of whether you have a single server or multiple ones, CloudBerry and Backblaze B2 provide the features necessary to ensure that your server data is securely and affordably backed up offsite. Our joint, low-touch solution only needs to be set up one time and your backup plan can be easily replicated and deployed across as many servers as needed. With Backblaze and CloudBerry, your backup plan can match your infrastructure, minimizing cost, so that you pay for only what you need.
As of March 31, 2019, Backblaze had 106,238 spinning hard drives in our cloud storage ecosystem spread across three data centers. Of that number, there were 1,913 boot drives and 104,325 data drives. This review looks at the Q1 2019 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. In addition, we have a few questions for you to ponder near the end of the post. As always, we look forward to your comments.
Hard Drive Failure Stats for Q1 2019
At the end of Q1 2019, Backblaze was using 104,325 hard drives to store data. For our evaluation we remove from consideration those drives that were used for testing purposes and those drive models for which we did not have at least 45 drives (see why below). This leaves us with 104,130 hard drives. The table below covers what happened in Q1 2019.
Notes and Observations
If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q1 2019. The two drives listed with zero failures in Q1 were the 4 TB and 5 TB Toshiba models. Neither has a large enough number of drive days to be statistically significant, but in the case of the 5 TB model, you have to go back to Q2 2016 to find the last drive failure we had of that model.
There were 195 drives (104,325 minus 104,130) that were not included in the list above because they were used as testing drives or we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods. Beginning next quarter that threshold will change; we’ll get to that shortly.
The Annualized Failure Rate (AFR) for Q1 is 1.56%. That’s as high as the quarterly rate has been since Q4 2017 and its part of an overall upward trend we’ve seen in the quarterly failure rates over the last few quarters. Let’s take a closer look.
We noted in previous reports that using the quarterly reports is useful in spotting trends about a particular drive or even a manufacturer. Still, you need to have enough data (drive count and drive days) in each observed period (quarter) to make any analysis valid. To that end the chart below uses quarterly data from Seagate and HGST drives while leaving out Toshiba and WDC drives as we don’t have enough drives from those manufacturers over the course of the last three years.
Over the last three years, the trend for both Seagate and HGST annualized failure rates had improved, i.e. gone down. While Seagate has reduced their failure rate over 50% during that time, the upward trend over the last three quarters requires some consideration. We’ll take a look at this and let you know if we find anything interesting in a future post.
Changing the Qualification Threshold
As reported over the last several quarters, we’ve been migrating from lower density drives, 2, 3, and 4 TB drives, to larger 10, 12, and 14 TB hard drives. At the same time, we have been replacing our stand-alone 45-drive Storage Pods with 60-drive Storage Pods arranged into the Backblaze Vault configuration of 20 Storage Pods per vault. In Q1, the last stand-alone 45-drive Storage Pod was retired. Therefore, using 45 drives as the threshold for qualification to our quarterly report seems antiquated. This is a good time to switch to using Drive Days as the qualification criteria. In reviewing our data, we have decided to use 5,000 Drive Days as the threshold going forward. The exception, any current drives we are reporting, such as the Toshiba 5 TB model with about 4,000 hours each quarter, will continue to be included in our Hard Drive Stats reports.
Fewer Drives = More Data
Those of you who follow our quarterly reports might have observed that the total number of hard drives in service decreased in Q1 by 648 drives compared to Q4 2018, yet we added nearly 60 petabytes of storage. You can see what changed in the chart below.
Lifetime Hard Drive Stats
The table below shows the lifetime failure rates for the hard drive models we had in service as of March 31, 2019. This is over the period beginning in April 2013 and ending March 31, 2019.
Predictions for the Rest of 2019
As 2019 unfolds, here are a few guesses as to what might happen over the course of the year. Let’s see what you think.
By the end of 2019, which, if any, of the following things will happen? Let us know in the comments.
Backblaze will continue to migrate out 4 TB drives and will have fewer than 15,000 by the end of 2019: we currently have about 35,000.
We will have installed at least twenty 20 TB drives for testing purposes.
Backblaze will go over 1 exabyte (1,000 petabytes) of available cloud storage. We are currently at about 850 petabytes of available storage.
We will have installed, for testing purposes, at least 1 HAMR based drive from Seagate and/or 1 MAMR drive from Western Digital.
The Hard Drive Stats Data
The complete data set used to create the information used in this review is available on our Hard Drive Test Data web page. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and, 3) You do not sell this data to anyone — it is free.
If you just want the summarized data used to create the tables and charts in this blog post you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
Where Does the Media Industry Really Use Cloud Storage?
Our new cloud survey results might surprise you.
Predicting which promising new technologies will be adopted quickly, which ones will take longer, and which ones will fade away is not always easy. When the iPhone was introduced in 2007, only 6% of the US population had smartphones. In less than 10 years, over 80% of Americans owned smartphones. In contrast, video telephone calls demonstrated at the 1964 New York World’s Fair only became commonplace 45 years later with the advent of FaceTime. And those flying cars people have dreamed of since the 1950s? Don’t hold your breath.
What about cloud storage? Who is adopting it today and for what purposes?
“While M&E professionals are not abandoning existing storage alternatives, they increasingly see the public cloud in storage applications as simply another professional tool to achieve their production, distribution, and archiving goals. For the future, that trend looks to continue as the public cloud takes on an even greater share of their overall storage requirements.”
— Phil Kurz, contributing editor, TV Technology
At Backblaze, we have a front-line view of how customers use cloud for storage. And based on the media-oriented customers we’ve directly worked with to integrate cloud storage, we know they’re using cloud storage throughout the workflow: backing up files during content creation (UCSC Silicon Valley), managing production storage more efficiently (WunderVu), archiving of historical content libraries (Austin City Limits), hosting media files for download (American Public Television), and even editing cloud-based video (Everwell).
We wanted to understand more about how the broader industry uses cloud storage and their beliefs and concerns about it, so we could better serve the needs of our current customers and anticipate what their needs will be in the future.
We decided to sponsor an in-depth survey with TV Technology, a media company that for over 30 years has been an authority for news, analysis and trend reports serving the media and entertainment industries. While TV Technology had conducted a similar survey in 2015, we thought it’d be interesting to see how the industry outlook has evolved. Based on our 2019 results, it certainly has. As a quick example, security was a concern for 71% of respondents in 2015. This year, only 38% selected security as an issue at all.
Survey Methodology — 246 Respondents and 15 Detailed Questions
For the survey, TV Technology queried 246 respondents, primarily from production and post-production studios and broadcasters, but also other market segments including corporate video, government, and education. See chart below for the breakdown. Respondents were asked 15 questions about their cloud storage usage today and in the future, and for what purpose. The survey queried what motivated their move to the cloud, their expectations for access times and cost, and any obstacles that are preventing further cloud adoption.
Survey Insights — Half Use Public Cloud Today — Cloud the Top Choice for Archive
Overall, the survey reveals growing cloud adoption for media organizations who want to improve production efficiency and to reduce costs. Key findings from the report include:
On the whole, about half of the respondents from all organization types are using public cloud services. Sixty-four percent of production/post studio respondents say they currently use the cloud. Broadcasters report lower adoption, with only 26 percent using the public cloud.
Achieving greater efficiency in production was cited by all respondents as the top reason for adopting the cloud. However, while this is also important to broadcasters, their top motivator for cloud use is cost containment or internal savings programs.
Cloud storage is clearly the top choice for archiving media assets, with 70 percent choosing the public cloud for active, deep, or very deep archive needs.
Concerns over the security of assets stored in a public cloud remain, however they have been assuaged greatly compared to the 2015 report, so much so that they are no longer the top obstacle to cloud adoption. For 40%, pricing has replaced security as the top concern.
With NAB 2019 only days away, the Backblaze team is excited to launch into the world’s largest event for creatives, and our biggest booth yet!
Must See — Backblaze Booth
This year we’ll be celebrating some of the phenomenal creative work by our customers, including American Public Television, Crisp Video, Falcons’ Digital Creative, WunderVu, and many more.
We’ll have workflow experts standing by to chat with you about your workflow frustrations, and how Backblaze B2 Cloud Storage can be the key to unlocking efficiency and solving storage challenges throughout your entire workflow: From Action! To Archive. With B2, you can focus on creating and managing content, not managing storage.
Create: Bring Your Story to Life
Stop by our booth and we can show you how you can protect your content from ingest through work-in-process by syncing seamlessly to the cloud. We can also detail how you can improve team collaboration and increase content reuse by organizing your content with one of our MAM integrations.
Distribute: Share Your Story With the World
Our experts can show you how B2 can help you scale your content library instantly and indefinitely, and avoid the hassle and expense of on-premises storage. We can demonstrate how everything in your content library can be served directly from your B2 account or through our content delivery partners like Cloudflare.
Preserve: Make Sure Your Story Lives Forever
Want to see the math behind the first cloud storage that’s more affordable than LTO? We can step through the numbers. We can also show you how B2 will keep your archived content accessible, anytime, and anywhere, through a web browser, API calls, or one of our integrated applications listed below.
Must See — Workflow Integrations You Can Count On
Our fantastic workflow partners are a critical part of your creative workflow backed by Backblaze — and there’s a lot of partner news to catch up on!
Drop by our booth to pick up a handy map to help you find Backblaze partners on the show floor including:
Backup and Archive Workflow Integrations
Archiware P5, booth SL15416 SyncBackPro, Wynn Salon — J
File Transfer Acceleration, Data Wrangling, Data Movement
Monday morning we’re delivering a presentation in the Scale Logic Knowledge Zone, and Tuesday night of NAB we’re honored to help sponsor the all-new Faster Together event that replaces the long-standing Las Vegas Creative User Supermeet event.
We’ll be raffling off a Hover2 4K drone powered by AI to help you get that perfect drone shot for your next creative film! So after the NAB show wraps up on Tuesday, head over to the Rio main ballroom for a night of mingling with creatives and amazing talks by some of the top editors, colorists, and VFX artists in the industry.
ProVideoTech and Backblaze at Scale Logic Knowledge Zone Monday April 8 at 11 AM Scale Logic Knowledge Zone, NAB Booth SL111109 Monday of NAB, Backblaze and PVT will deliver a live presentation for NAB attendees on how to build hybrid-cloud workflows with Cantemo and Backblaze. Scale Logic Media Management Knowledge Zone
There’s a wide range of reasons why businesses want to migrate away from their current archive solution, ranging from managing risk, concerns over legacy hardware, media degradation and format support. Many businesses also find themselves stuck with closed format solutions that are based on legacy middleware with escalating support costs. It is a common problem that we at Ortana have helped many clients overcome through smart and effective use of the many storage solutions available on the market today. As founder and CEO of Ortana, I want to share some of our collective experience around this topic and how we have found success for our clients.
First, we often forget how quickly the storage landscape changes. Let’s take a typical case.
It’s Christmas 2008 and a CTO has just finalised the order on their new enterprise-grade hierarchical storage management (HSM) system with an LTO-4 tape robot. Beyonce’s Single Ladies is playing on the radio, GPS on phones has just started to be rolled out, and there is this new means of deploying mobile apps called the Apple App Store! The system purchased is from a well established, reputable company and provides peace of mind and scalability — what more can you ask for? The CTO goes home for the festive season — job well done — and hopes Santa brings him one of the new Android phones that have just launched.
Ten years on, the world is very different and Moore’s law tells us that the pace of technological change is only set to increase. That growing archive has remained on the same hardware, controlled by the same HSM and has gone through one or two expensive LTO format changes. “These migrations had to happen,” the CTO concedes, as support for the older LTO formats was being dropped by the hardware supplier. Their whole content library had to be restored and archived back to the new tapes. New LTO formats also required new versions of the HSM, and whilst these often included new features — over codec support, intelligent repacking and reporting — the fundamentals of the system remained: closed format, restricted accessibility, and expensive. Worse still, the annual support costs are increasing whilst the new feature development has ground to a halt. Sure the archive still works, but for how much longer?
Decisions, Decisions, So Many Migration Decisions
As businesses make the painful decision to migrate their legacy archive, the choices of what, where, and how become overwhelming. The storage landscape today is a completely different picture from when closed format solutions went live. This change alone offers significant opportunities to businesses. By combining the right storage solutions with seamless architecture and with lights out orchestration driving the entire process, businesses can flourish by allowing their storage to react to the needs of the business, not constrain them. Ortana has purposefully ensured Cubix (our asset management, automation, and orchestration platform) is as storage agnostic as possible by integrating a range of on-premises and cloud-based solutions, and built an orchestration engine that is fully abstracted from this integration layer. The end result is that workflow changes can be done in seconds without affecting the storage.
As our example CTO would say (shaking their head no doubt whilst saying it), a company’s main priority is to not-be-here-again, and the key is to store media in an open format, not bound to any one vendor, but also accessible to the business needs both today and tomorrow. The cost of online cloud storage such as Backblaze has now made storing content in the cloud more cost effective than LTO and this cost is only set to reduce further. This, combined with the ample internet bandwidth that has become ubiquitous, makes cloud storage an obvious primary storage target. Entirely agnostic to the format and codec of content you are storing, aligned with MPAA best practices and easily integrated to any on-premise or cloud-based workflows, cloud storage removes many of the issues faced by closed-format HSMs deployed in so many facilities today. It also begins to change the dialogue over main vs DR storage, since it’s no longer based at a facility within the business.
Cloud Storage Opens Up New Capabilities
Sometimes people worry that cloud storage will be too slow. Where this is true, it is almost always due to poor cloud implementation. B2 is online, meaning that the time-to-first-byte is almost zero, whereas other cloud solutions such as Amazon Glacier are cold storage, meaning that the time-to-first-byte ranges from at best one to two hours, but in general six to twelve hours. Anything that is to replace an LTO solution needs to match or beat the capacity and speed of the incumbent solution, and good workflow design can ensure that restores are done as promptly as possible and direct to where the media is needed.
But what about those nasty egress costs? People can get caught off guard when this is not budgeted for correctly, or when their workflow does not make good use of simple solutions such as proxies. Regardless of whether your archive is located on LTO or in the cloud, proxies are critical to keeping accessibility up and costs and restore times down. By default, when we deploy Cubix for clients we always generate a frame accurate proxy for video content, often devalued through the use of burnt-in timecode (BITC), logos, and overlays. Generated using open source transcoders, they are incredibly cost effective to generate and are often only a fraction of the size of the source files. These proxies, which can also be stored and served directly from B2 storage, are then used throughout all our portals to allow users to search, find, and view content. This avoids the time and cost required to restore the high resolution master files. Only when the exact content required is found is a restore submitted for the full-resolution masters.
Multiple Copies Stored at Multiple Locations by Multiple Providers
Moving content to the cloud doesn’t remove the risk of working with a single provider, however. No matter how good or big they are, it’s always a wise idea to ensure an active disaster recovery solution is present within your workflows. This last resort copy does not need all the capabilities of the primary storage, and can even be more punitive when it comes to restore costs and times. But it should be possible to enable in moments, and be part of the orchestration engine rather than being a manual process.
The need to de-risk that single provider, or for workflows where 30-40% of the original content has to be regularly restored (as proxies do not meet the needs of the workflow), on premise archive solutions still can be deployed without being caught in the issues discussed earlier. Firstly, LTO now offers portability benefits through LTFS, an easy to use open format, which critically has its specification and implementation within the public domain. This ensures it is easily supported by many vendors and guarantees support longevity for on-premises storage. Ortana with its Cubix platform supports many HSMs that can write content in native LTFS format that can be read by any standalone drive from any vendor supporting LTFS.
Also, with 12 TB hard drives now standard in the marketplace, nearline based storage has also become a strong contender for content when combined with intelligent storage tiering to the cloud or LTO. Cubix can fully automate this process, especially when complemented by such vendors as GB Labs’ wide range of hardware solutions. This mix of cloud, nearline and LTO — being driven by an intelligent MAM and orchestration platform like Cubix to manage content in the most efficient means possible on a per workflow basis — blurs the lines between primary storage, DR, and last resort copies.
Streamlining the Migration Process
Once you have your storage mix agreed upon and in place, now your fraught task is getting your existing library onto the new solution whilst not impacting access to the business. Some HSM vendors suggest swapping your LTO tapes by physically removing them from one library and inserting them into another. Ortana knows that libraries are often the linchpin of the organisation and any downtime has significant negative impact that can fill media managers with dread, especially since these one shot, one direction migrations can easily go wrong. Moreover, when following this route, simply moving tapes does not persist any editorial metadata or resolve many of the objectives around making content more available. Cubix not only manages the media and the entire transformation process, but also retains the editorial metadata from the existing archive also.
Given the high speeds that LTO delivers, combined with the scalability of Cubix, the largest libraries can be migrated in short timescales, whilst having zero downtime on the archive. Whilst the content is being migrated to the defined mix of storage targets, Cubix can perform several tasks on the content to further augment the metadata, including basics such as proxy and waveform generation, through to AI based image detection and speech to text. Such processes only further reduce the time spent by staff looking for content, and further refine the search capability to ensure only that content required is restored — translating directly to reduced restore times and egress costs.
A Real-World Customer Example
Many of the above concerns and considerations led a large broadcaster to Ortana for a large-scale migration project. The broadcaster produces in-house news and post production with multi-channel linear playout and video-on-demand (VoD). Their existing archive was 3 PB of media across two generations of LTO tape managed by Oracle DIVArchive & DIVADirector. They were concerned about on-going support for DIVA and wanted to fully migrate all tape and disk-based content to a new HSM in an expedited manner, making full use of the dedicated drive resources available.
Their primary goal was to fully migrate all editorial metadata into Cubix, including all ancillary files (subtitles, scripts, etc.), and index all media using AI-powered content discovery to reduce searching times for news, promos /and sports departments at the same time. They also wanted to replace the legacy Windows Media Video (WMV) proxy with new full HD H264 frame accurate proxy, and provide the business secure, group-based access to the content. Finally, they wanted all the benefits of cloud storage, whilst keeping costs to a minimum.
With Ortana’s Cubix Core, the broadcaster was able to safely migrate their DIVAarchive to two storage platforms: LTFS with a Quantum HSM system and Backblaze B2 cloud storage. Their content was indexed via AI powered image recognition (Google Vision) and speech to text (Speechmatics) during the migration process, and the Cubix UI replaced existing archive as media portal for both internal and external stakeholders.
The new solution has vastly reduced the timescales for content processing across all departments, and has led to a direct reduction in staff costs. Researchers report a 50-70% reduction in time spent searching for content, and the archive shows a 40% reduction in restore requests. By having the content located in two distinct geographical locations they’ve entirely removed their business risk of having their archive with a single vendor and in a single location. Most importantly, their archived content is more active than ever and they can be sure it will stay alive for the future.
How exactly did Ortana help them do it?Join our webinarEvading Extinction: Migrating Legacy Archives on Thursday, March 28, 2019. We’ll detail all the steps we took in the process and include a live demo of Cubix. We’ll show you how straightforward and painless the archive migration can be with the right strategy, the right tools, and the right storage.
— James Gibson, Founder & CEO, Ortana Media Group
• • •
Backblaze will be exhibiting at NAB 2019 in Las Vegas on April 8-11, 2019.Schedule a meeting with our cloud storage experts to learn how B2 Cloud Storage can streamline your workflow today!
Whatever your creative venture, the byproduct of all your creative effort is assets. Whether you produce music, images, or video, as you produce more and more of these valuable assets, they tend to pile up and become difficult to manage, organize, and protect. As your creative practice evolves to meet new demands, and the scale of your business grows, you’ll often find that your current way of organizing and retrieving assets can’t keep up with the pace of your production.
For example, if you’ve been managing files by placing them in carefully named folders, getting those assets into a media asset management system will make them far easier to navigate and much easier to pull out exactly the media you need for a new project. Your team will be more efficient and you can deliver your finished content faster.
As we’ve covered before, putting your assets in a type of storage like B2 Cloud Storage ensures that they will be protected in a highly durable and highly available way that lets your entire team be productive.
You can learn about some of the new capabilities of the latest cloud-based collaboration tools here:
With some smart planning, and a little bit of knowledge, you can be prepared to get the most of your assets as you move them into an asset management system, or when migrating from an older or less capable system into a new one.
Assets and Metadata
Before we can build some playbooks to get the most from your creative assets, let’s review a few key concepts.
Asset — a rich media file with intrinsic metadata.
An asset is simply a file that is the result of your creative operation, and most often a rich media file like an image or a video. Typically, these files are captured or created in a raw state, then your creative team adds value to that raw asset by editing it together with other assets to create a finished story that in turn, becomes another asset to manage.
Metadata — Information about a file, either embedded within the file itself or associated with the file by another system, typically a media asset management (MAM) application.
The file carries information about itself that can be understood by your laptop or workstation’s operating system. Some of these seem obvious, like the name of the file, how much storage space it occupies, when it was first created, and when it was last modified. These would all be helpful ways to try to find one particular file you are looking for among thousands just using the tools available in your OS’s file manager.
There’s usually another level of metadata embedded in media files that is not so obvious but potentially enormously useful: metadata embedded in the file when it’s created by a camera, film scanner, or output by a program.
For example, this image taken in Backblaze’s data center a few years ago carries all kinds of interesting information. For example, when I inspect the file on macOS’s Finder with Get Info, a wealth of information is revealed. I can now not only tell the image’s dimensions and when the image was taken, but also exactly what kind of camera took this picture and the lens settings that were used, as well.
As you can see, this metadata could be very useful if you want to find all images taken on that day, or even images taken with that same camera, focal length, F-stop, or exposure.
When a File and Folder System Can’t Keep Up
Inspecting files one at a time is useful, but a very slow way to determine if a file is the one you need for a new project. Yet many creative environments that don’t have a formal asset management system get by with an ad hoc system of file and folder structures, often kept on the same storage used for production or even on an external hard drive.
Teams quickly outgrow that system when they find that their work spills over to multiple hard drives, or takes up too much space on their production storage. Worst of all, assets kept on a single hard drive are vulnerable to disk damage, or to being accidentally copied or overwritten.
Why Your Assets Need to be Managed
To meet this challenge, creative teams have often turned to a class of application called a Media Asset Manager (MAM). A MAM automatically extracts all their assets’ inherent metadata, helps move files to protected storage, and makes them instantly available to their entire team. In a way, these media asset managers become a private media search engine where any file attribute can be a search query to instantly uncover the file they need in even the largest media asset libraries.
Beyond that, asset management systems are rapidly becoming highly effective collaboration and workflow tools. For example, tagging a series of files as Field Interviews — April 2019, or flagging an edited piece of content as HOLD — do not show customer can be very useful indeed.
The Inner Workings of a Media Asset Manager
When you add files into an asset management system, the application inspects each file, extracting every available bit of information about the file, noting the file’s location on storage, and often creating a smaller stand-in or proxy version of the file that is easier to present to users.
To keep track of this information, asset manager applications employ a database and keep information about your files in it. This way, when you’re searching for a particular set of files among your entire asset library, you can simply make a query of your asset manager’s database in an instant rather than rifling through your entire asset library storage system. The application takes the results of that database query and retrieves the files you need.
The Asset Migration Playbook
Whether you need to move from a file and folder based system to a new asset manager, or have been using an older system and want to move to a new one without losing all of the metadata that you have painstakingly developed, a sound playbook for migrating your assets can help guide you.
Play 1 — Getting Assets in Files and Folders Protected Without an Asset Management System
In this scenario, your assets are in a set of files and folders, and you aren’t ready to implement your asset management system yet.
The first consideration is for the safety of the assets. Files on a single hard drive are vulnerable, so if you are not ready to choose an asset manager your first priority should be to get those files into a secure cloud storage service like Backblaze B2.
Then, when you have chosen an asset management system, you can simply point the system at your cloud-based asset storage to extract the metadata of the files and populate the asset information in your asset manager.
Get assets archived or moved to cloud storage
Choose your asset management system
Ingest assets directly from your cloud storage
Play 2 — Getting Assets in Files and Folders into Your Asset Management System Backed by Cloud Storage
In this scenario, you’ve chosen your asset management system, and need to get your local assets in files and folders ingested and protected in the most efficient way possible.
You’ll ingest all of your files into your asset manager from local storage, then archive them to cloud storage. Once your asset manager has been configured with your cloud storage credentials, it can automatically move a copy of local files to the cloud for you. Later, when you have confirmed that the file has been copied to the cloud, you can safely delete the local copy.
Ingest assets from local storage directly into your asset manager system
From within your asset manager system archive a copy of files to your cloud storage
Once safely archived, the local copy can be deleted
Play 3 — Getting a Lot of Assets on Local Storage into Your Asset Management System Backed by Cloud Storage
If you have a lot of content, more than say, 20 terabytes, you will want to use a rapid ingest service similar to Backblaze’s Fireball system. You copy the files to Fireball, Backblaze puts them directly into your asset management bucket, and the asset manager is then updated with the file’s new location in your Backblaze B2 account.
This can be a manual process, or can be done with scripting to make the process faster.
Play 4 — Moving from One Asset Manager System to a New One Without Losing Metadata
In this scenario you have an existing asset management system and need to move to a new one as efficiently as possible to not only take advantage of your new system’s features and get files protected in cloud storage, but also to do it in a way that does not impact your existing production.
Some asset management systems will allow you to export the database contents in a format that can be imported by a new system. Some older systems may not have that luxury and will require the expertise of a database expert to manually extract the metadata. Either way, you can expect to need to map the fields from the old system to the fields in the new system.
Making a copy of old database is a must. Don’t work on the primary copy, and be sure to conduct tests on small groups of files as you’re migrating from the older system to the new. You need to ensure that the metadata is correct in the new system, with special attention that the actual file location is mapped properly. It’s wise to keep the old system up and running for a while before completely phasing it out.
Export the database from the old system
Import the records into the new system
Ensure that the metadata is correct in the new system and file locations are working properly
Make archive copies of your files to cloud storage
Once the new system has been running through a few production cycles, it’s safe to power down the old system
Play 5 — Moving Quickly from an Asset Manager System on Local Storage to a Cloud-based System
In this variation of Play 4, you can move content to object storage with a rapid ingest service like Backblaze Fireball at the same time that you migrate to a cloud-based system. This step will benefit from scripting to create records in your new system with all of your metadata, then relink with the actual file location in your cloud storage all in one pass.
You should test that your asset management system can recognize a file already in the system without creating a duplicate copy of the file. This is done differently by each asset management system.
Export the database from the old system
Import the records into the new system while creating placeholder records with the metadata only
Archive your local assets to Fireball (up to 70 TB at a time)
Once the files have been uploaded by Backblaze, relink the cloud based location to the asset record
Every production environment is different, but we all need the same thing: to be able to find and organize our content so that we can be more productive and rest easy knowing that our content is protected.
These plays will help you take that step and be ready for any future production challenges and opportunities.
If you’d like more information about media asset manager migration, join us for our webinar on March 15, 2019:
If you make copies of your images or video files for safekeeping, are you backing them up or archiving them? It’s been discussed many times before, but the short answer is that it depends on the function of the copy. For media workflows, a crisp understanding is required in order to implement the right tools. In today’s post, we’ll explore the nuances between backup and archiving in media workflows and provide a real world application from UCSC Silicon Valley.
We explored the broader topic of backing up versus archiving in our What’s the Diff: Backup vs Archive post. It’s a backup if you copy data to keep it available in case of loss, while it’s an archive if you make a copy for regulatory compliance, or to move older, less-used data off to cheaper storage. Simple, right? Not if you’re talking about image, video and other media files.
Backup vs. Archive for Professional Media Productions
Traditional definitions don’t fully capture how backup and archive typically operate in professional media workflows compared to business operations. Video and images aren’t typical business data in a number of ways, and that profoundly impacts how they’re protected and preserved throughout their lifecycle. With media backup there are key differences in which files get backed up and how they get backed up. With media archive there are key differences in when files get archived and why they’re archived.
Large Media Files Sizes Slow Down Backup
The most obvious nuance is that media files are BIG. While most business documents are under 30 MB in size, a single second of video could be larger than 30 MB at higher resolutions and frame rates. Backing up such large file sizes can take longer than the traditional backup windows of overnight for incremental backups and a weekend for full backup. And you can’t expect deduplication to shorten backup times or reduce backup sizes, either. Video and images don’t dedupe well.
Meanwhile, the editing process generates a flurry of intermediate or temporary files in the active content creation workspace that don’t need to be backed up because they can be easily regenerated from source files.
The best backup solutions for media allow you to specify exactly which directories and file types you want backed up, so that you’re taking time for and paying for only what you need.
Archiving to Save Space on Production Storage
Another difference is that archiving to reduce production storage costs is much more common in professional media workflows than with business documents, which are more likely to be archived for compliance. High-resolution video editing in particular requires expensive, high-performance storage to deliver multiple streams of content to multiple users simultaneously without dropping frames. With the large file sizes that come with high-resolution content, this expensive resource fills up quickly with content not needed for current productions. Archiving completed projects and infrequently-used assets can keep production storage capacities under control.
Media asset managers (MAMs) can simplify the archive and retrieval process. Assets can be archived directly through the MAM’s visual interface, and after archiving, their thumbnail or proxies remain visible to users. Archived content remains fully searchable by its metadata and can also be retrieved directly through the MAM interface. For more information on MAMs, read What’s the Diff: DAM vs MAM.
Strategically archiving select media files to less expensive storage allows facilities to stay within budget, and when done properly, keeps all of your content readily accessible for new projects and repurposing.
Permanently Secure Source Files and Raw Footage on Ingest
A less obvious way that media is different is that video files are fixed content that don’t actually change during the editing process. Instead, editing suites compile changes to be made to the original and apply the changes only when making the final cut and format for delivery. Since these source files are not going to change, and are often irreplaceable, many facilities save a copy to secondary storage as soon as they’re ingested to the workflow. This copy serves as a backup to the file on local storage during the editing process. Later, when the local copy is no longer actively being used, it can be safely deleted knowing it’s secured in the archive. I mean backup. Wait, which is it?
Whether you call it archive or backup, make a copy of source files in a storage location that lives forever and is accessible for repurposing throughout your workflow.
To see how all this works in the real world, here’s how UCSC Silicon Valley designed a new solution that integrates backup, archive, and asset management with B2 cloud storage so that their media is protected, preserved and organized at every step of their workflow.
How UCSC Silicon Valley Secured Their Workflow’s Data
UCSC Silicon Valley built a greenfield video production workflow to support UC Scout, the University of California’s online learning program that gives high school students access to the advanced courses they need to be eligible and competitive for college. Three teams of editors, producers, graphic designers and animation artists — a total of 22 creative professionals — needed to share files and collaborate effectively, and digital asset manager Sara Brylowski was tasked with building and managing their workflow.
Sara and her team had specific requirements. For backup, they needed to protect active files on their media server with an automated backup solution that allowed accidentally deleted files to be easily restored. Then, to manage storage capacity more effectively on their media server, they wanted to archive completed videos and other assets that they didn’t expect to need immediately. To organize content, they needed an asset manager with seamless archive capabilities, including fast self-service archive retrieval.
They wanted the reliability and simplicity of the cloud to store both their backup and archive data. “We had no interest in using LTO tape for backup or archive. Tape would ultimately require more work and the media would degrade. We wanted something more hands off and reliable,” Sara explained. The cloud choice was narrowed to Backblaze B2 or Amazon S3. Both were proven cloud solutions that were fully integrated with the hardware and software tools in their workflow. Backblaze was chosen because its $5 per terabyte per month pricing was a fraction of the cost of Amazon S3.
Removing Workflow Inefficiencies with Smarter Backup and Archive
The team had previously used the university’s standard cloud backup service to protect active files on the media server as they worked on new videos. But because that cloud backup was designed for traditional file servers, it backed up everything, even the iterative files generated by video production tools like Adobe Premiere, After Effects, Maya 3D and Cinema 3D that didn’t need to be backed up. For this reason, Sara pushed to not use the university’s backup provider. It was expensive in large part because it was saving all of this noise in perpetuity.
“With our new workflow we can manage our content within its life cycle and at same time have reliable backup storage for the items we know we’re going to want in the future. That’s allowed us to concentrate on creating videos, not managing storage.”—Sara Brylowski, UCSC Silicon Valley
After creating thousands of videos for 65 online courses, their media server was quickly filling to its 128 TB capacity. They needed to archive data from completed projects to make room for new ones, sooner rather than later. Deploying a MAM solution would simplify archiving, while also helping them organize their diverse and growing library of assets — video shot in studio, B-roll, licensed images, and audio from multiple sources.
Modern Storage Workflows in the Age of Cloud, Part 2
In Modern Storage Workflows in the Age of Cloud, Part One, we introduced a powerful maxim to guide content creators (anyone involved in video or rich media production) in choosing storage for the different parts of their content creation workflows:
Choose the storage that best fits each workflow step.
It’s true that every video production environment is different, with different needs, and the ideal solution for an independent studio of a few people is different than the solution for a 50-seat post-production house. But the goal of everyone in the business of creative storytelling is to tell stories and let your vision and craft shine through. Anything that makes that job more complicated and more frustrating keeps you from doing your best work.
Given how prevalent, useful, and inexpensive cloud technologies are, almost every team today is rapidly finding they can jettison whole classes of storage that are complicating their workflow and instead focus on two main types of storage:
Fast, shared production storage to support editing for content creation teams (with no need to oversize or overspend)
Active, durable, and inexpensive cloud storage that lets you move all of your content in one protected, accessible place — your cloud-enabled content backplane
It turns out there’s another benefit unlocked when your content backplane is cloud enabled, and it’s closely tied to another production maxim:
Organizing content in a single, well managed repository makes that content more valuable as you use it.
When all content is in a single place, well-managed and accessible, content gets discovered faster and used more. Over time it will pick up more metadata, with sharper and more refined tags. A richer context is built around the tags, making it more likely that the content you already have will get repurposed for new projects.
Later, when you come across a large content repository to acquire, or contemplate a digitization or preservation project, you know you can bring it into the same content management system you’ve already refined, concentrating and increasing value further still.
Having more content that grows increasingly valuable over time becomes a monetization engine for licensing, content personalization, and OTT delivery.
You might think that these benefits already present a myriad of new possibilities, but cloud technologies are ready to accelerate the benefits even further.
Cloud Benefits — Pay as You Need It, Scalability, and Burstability
It’s worth recapping the familiar cost-based benefits of the cloud: 1) pay only for the resources you actually use, and only as long as you need them, and, 2) let the provider shoulder the expense of infrastructure support, maintenance, and continuous improvement of the service.
The cost savings from the cloud are obvious, but the scalability and flexibility of the cloud should be weighted strongly when comparing using the cloud versus handling infrastructure yourself. If you were responsible for a large server and storage system, how would you cope with a business doubling every quarter, or merging with another team for a big project?
Too many production houses end up disrupting their production workflow (and their revenue) when they are forced to beef up servers and storage capability to meet new production demands. Cloud computing and cloud storage offer a better solution. It’s possible to instantly bring on new capacity and capability, even when the need is unexpected.
Cloud Delivered Compute Horsepower on Demand
Let’s consider the example of a common task like transcoding content and embedding a watermark. You need to process 3,600 frames of a two hour movie to resize the frame and add a watermark, and that compute workload takes 100 minutes and ties up a single server.
You could adapt that workflow to the cloud by pulling high resolution frames from cloud storage, feed them to 10 cloud servers in parallel, and complete the same job in 10 minutes. Another option is to spin up 100 servers and get the job done in one minute.
The cloud provides the flexibility to cut workflow steps that used to take hours down to minutes by adding the compute horsepower that’s needed for the job, then turn it off when it’s no longer needed. You don’t need to worry about planning ahead or paying for ongoing maintenance. In short, compute adapts to your workflow rather than the other way around, which empowers you to make workflow choices that instead prioritize the creative need.
Your Workflow Applications Are Moving to the Cloud, Too
More and more of the applications used for content creation and management are moving to the cloud, as well. Modern web browsers are gaining astonishing new capabilities and there is less need for dedicated application servers accompanying storage.
What’s important is that the application helps you in the creative process, not the mechanics of how the application is served. Increasingly, this functionality is delivered by virtual machines that can be spun up by the thousands as needed or by cloud applications that are customized for each customer’s specific needs.
An example of a cloud-delivered workflow application — iconik asset discovery and project collaboration
iconik is one example of such a service. iconik delivers cloud-based asset management and project collaboration as a service. Instead of dedicated servers and storage in your data center, each customer has their own unique installation of iconik’s service that’s ready in minutes from first signup. The installation is exclusive to your organization and tailored to your needs. The result is a workflow utilizing virtual machines, compute, and storage that matches your workflow with just the resources you need. The resources are instantly available whenever or wherever your team is using the system, and consume no compute or storage resources when they are not.
Here’s an example. A video file can be pumped from Backblaze B2 to the iconik application running on a cloud compute instance. The proxies and asset metadata are stored in one place and available to every user. This approach is scalable to as many assets and productions you can throw at it, or as many people as are collaborating on the project.
The service is continuously upgraded and updated with new features and improvements as they become available, without the delay of rolling out enhancements and patches to different customers and locations.
Given the advantages of the cloud, we can expect that more steps in the creative production workflow that currently rely on dedicated on-site servers will move to the highly agile and adaptable environment offered by the cloud.
The Next Evolution — AI Becomes Content-Aware
Having your content library in a single content backplane in the cloud provides another benefit: ready access to a host of artificial intelligence (AI) tools.
Examples of AI Tools That Can Improve Creative Production Workflows:
Text to speech transcription
Object recognition and tagging
Brand use recognition
High resolution conversion
AI tools can be viewed as compute workers that develop processing rules by training for a desired result on a data set. An AI tool can be trained by having it process millions of images until it can tell the difference between sky and grass, or pick out a car in a frame of video. Once such a tool has been trained, it provides an inexpensive way to add valuable metadata to content, letting you find, for example, every video clip across your entire library that has sky, or grass, or a car in it. Text keywords with an associated timecode can be automatically added to aid in quickly zeroing in on a specific section of a long video clip. That’s something that’s not practical for a human content technician over thousands of files, but is easy, repeatable, and scalable for an AI tool.
Let AI Breathe New Life into Existing Content
AI tools can breathe new life in older content and intelligently clean up older format source video by removing film scratches or upresing content to today’s higher resolution formats. They can be valuable for digital restoration and preservation projects, too. With AI tools and source content in the cloud, it’s now possible to give new life to analog source footage. Digitize it, let AI clean it up, and you’ll get fresh, monetizable assets in your library.
An example of the time-synched tags that can be generated with an AI tool
Many workflow tools, such as asset and collaboration tools, can use AI tools for speech transcription or smart object recognition, which brings additional capabilities. axle.ai, for example, can connect with a visual search tool to highlight an object in the frame like a wine bottle, letting you subsequently find every shot of a wine bottle across your entire library.
Visual search for brands and products also is possible. Just highlight a brand logo and find every clip where the camera panned over that logo. It’s smart enough to gets results even when only part of the logo is shown.
We’ve barely touched on the many tools that can be applied to content on ingest or content already in place. Whichever way they’re applied, they can deliver on the promise of making your workflows more efficient and powerful, and your content more valuable.
All Together Now
Taken together, these trends are great news for creatives. They can serve your creative vision by making your workflow more agile and more efficient. Cloud-enabled technologies enable you to focus on adding value and repurposing content in fresh new ways, resulting in new audiences and better monetization.
By placing your content in a cloud content backplane, and taking advantage of applications as a service, including the latest AI tools, it becomes possible to continually grow your content collection while increasing its value — a desirable outcome for any creative production enterprise.
If you could focus only on delivering great creative content, and had a host of AI tools to automatically make your content more valuable, what would you do?
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.