Tag Archives: Cloud Storage

Announcing Facebook Photo and Video Transfers Direct to Backblaze B2 Cloud Storage

2020-10-22 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/facebook-photo-video-transfers-direct-to-cloud-storage/

Facebook pointing to Backblaze

Perhaps I’m dating myself when I say that I’ve been using Facebook for a very long time. So long that the platform is home to many precious photos and videos that I couldn’t imagine losing. And even though they’re mostly shared to Facebook from my phone or other apps, some aren’t. So I’ve periodically downloaded my Facebook albums to my Mac, which I’ve of course set to automatically back up with Backblaze, to ensure they’re safely archived.

And while it’s good to know how to download and back up your social media profile, you might be excited to learn that it’s just become a lot easier: Facebook has integrated Backblaze B2 Cloud Storage directly as a data transfer destination for your photos and videos. This means you can now migrate or copy years of memories in a matter of clicks.

What Data Transfer Means for You

If you use Facebook and want to exercise even greater control over the media you’ve posted there, you’ll find that this seamless integration enables:

Personal safeguarding of images and videos in Backblaze.
Enhanced file sharing and access control options.
Ability to organize, modify, and collaborate on content.

How to Move Your Data to Backblaze B2

Current Backblaze B2 customers can start data transfers within Facebook via Settings & Privacy > Settings / Your Facebook Information / Transfer a Copy of Your Photos or Videos / Choose Destination / Backblaze.

Transfer a Copy of Your Photos or Videos

Transfer a Copy of Your Photos or Videos to Backblaze

If you don’t have a Backblaze B2 account, you can create one here. You’ll need a Key ID and an Application Key when you select Backblaze.

The Data Transfer Project and B2 Cloud Storage

The secure, encrypted data transfer service is based on code Facebook developed through the open-source Data Transfer Project (and you all know we love open-source projects, from our original Storage Pod design to Reed-Solomon erasure coding). Data routed to your B2 Cloud Storage account enjoys our standard $5/TB month pricing with a standard 10GB of free capacity.

Our Co-Founder and CEO, Gleb Budman, noted that this new integration harkens back to our roots: “We’ve been helping people safely store their photos and videos in our cloud for almost as long as Facebook has been providing the means to post content. For people on Facebook who want more choice in hosting their data outside the platform, we’re happy to make our cloud a seamlessly available destination.”

My take: 👍

The post Announcing Facebook Photo and Video Transfers Direct to Backblaze B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Hard Drive Stats Q3 2020

2020-10-20

Post Syndicated from original https://www.backblaze.com/blog/backblaze-hard-drive-stats-q3-2020/

As of September 30, 2020, Backblaze had 153,727 spinning hard drives in our cloud storage ecosystem spread across four data centers. Of that number, there were 2,780 boot drives and 150,947 data drives. This review looks at the Q3 2020 and lifetime hard drive failure rates of the data drive models currently in operation in our data centers and provides a handful of insights and observations along the way. As always, we look forward to your comments.

Quarterly Hard Drive Failure Stats for Q3 2020

At the end of Q3 2020, Backblaze was using 150,974 hard drives to store customer data. For our evaluation we remove from consideration those drive models for which we did not have at least 60 drives (more on that later). This leaves us with 150,757 hard drives in our review. The table below covers what happened in Q3 2020.

Observations on the Q3 Stats

There are several models with zero drive failures in the quarter. That’s great, but when we dig in a little we get different stories for each of the drives.

The 18TB Seagate model (ST18000NM000J) has 300 drive days and they’ve been in service for about 12 days. There were no out of the box failures which is a good start, but that’s all you can say.
The 16TB Seagate model (ST16000NM001G) has 5,428 drive days which is low, but they’ve been around for nearly 10 months on average. Still, I wouldn’t try to draw any conclusions yet, but a quarter or two more like this and we might have something to say.
The 4TB Toshiba model (MD04ABA400V) has only 9,108 drive days, but they have been putting up zeros for seven quarters straight. That has to count for something.
The 14TB Seagate model (ST14000NM001G) has 21,120 drive days with 2,400 drives, but they have only been operational for less than one month. Next quarter will give us a better picture.
The 4TB HGST (model: HMS5C4040ALE640) has 274,923 drive days with no failures this quarter. Everything else is awesome, but hold on before you run out and buy one. Why? You’re probably not going to get a new one and if you do, it will really be at least three years old, as HGST/WDC hasn’t made these drives in at least that long. If someone from HGST/WDC can confirm or deny that for us in the comments that would be great. There are stories dating back to 2016 where folks tried to order this drive and got a refurbished drive instead. If you want to give a refurbished drive a try, that’s fine, but that’s not what our numbers are based on.

The Q3 2020 annualized failure rate (AFR) of 0.89% is slightly higher than last quarter at 0.81%, but significantly lower than the 2.07% from a year ago. Even with the lower drive failure rates, our data center techs are not bored. In this quarter they added nearly 11,000 new drives totaling over 150PB of storage, all while operating under strict Covid-19 protocols. We’ll cover how they did that in a future post, but let’s just say they were busy.

The Island of Misfit Drives

There were 190 drives (150,947 minus 150,757) that were not included in the Q3 2020 Quarterly Chart above because we did not have at least 60 drives of a given model. Here’s a breakdown:

Nearly all of these drives were used as replacement drives. This happens when a given drive model is no longer available for purchase, but we have many in operation and we need a replacement. For example, we still have three WDC 6TB drives in use; they are installed in three different Storage Pods, along with 6TB drives from Seagate and HGST. Most of these drives were new when they were installed, but sometimes we reuse a drive that was removed from service, typically via a migration. Such drives are, of course, reformatted, wiped, and then must pass our qualification process to be reinstalled.

There are two “new” drives on our list. These are drives that are qualified for use in our data centers, but we haven’t deployed in quantity yet. In the case of the 10TB HGST drive, the availability and qualification of multiple 12TB models has reduced the likelihood that we would use more of this drive model. The 16TB Toshiba drive model is more likely to be deployed going forward as we get ready to deploy the next wave of big drives.

The Big Drives Are Here

When we first started collecting hard drive data back in 2013, a big drive was 4TB, with 5TB and 6TB drives just coming to market. Today, we’ll define big drives as 14TB, 16TB, and 18TB drives. The table below summarizes our current utilization of these drives.

The total of 19,878 represents 13.2% of our operational data drives. While most of these are the 14TB Toshiba drives, all of the above have been qualified for use in our data centers.

For all of the drive models besides the Toshiba 14TB drive, the number of drive days is still too small to conclude anything, although the Seagate 14TB model, the Toshiba 16TB model, and the Seagate 18TB model have experienced no failures to date.

We will continue to add these large drives over the coming quarters and track them along the way. As of Q3 2020, the lifetime AFR for this group of drives is 1.04%, which as we’ll see, is below the lifetime AFR for all of the drive models in operation.

Lifetime Hard Drive Failure Rates

The table below shows the lifetime AFR for the hard drive models we had in service as of September 30, 2020. All of the drive models listed were in operation during this timeframe.
The lifetime AFR as of Q3 2020 was 1.58%, the lowest since we started keeping track in 2013. That is down from 1.73% one year ago, and down from 1.64% last quarter.

We added back the average age column as “Avg Age.” This is in months and is the average age of the drives used to compute the data in the table and is based on the amount of time they have been in operation. One thing to remember is that our environment is very dynamic with drives being added, being migrated, and leaving on a regular basis and this could impact the average age. For example, we could retire a Storage Pod with mostly older drives and that could lower the average age of the remaining drives of that model while those remaining drives got older.

Looking at the average age, the 6TB Seagate drives are the oldest cohort, averaging nearly five and a half years of service each. These drives have actually gotten better over the last couple years and are aging well with a current lifetime AFR of 1.0%.

If you’d like to learn more, join us for a webinar Q&A with the author of Hard Drive Stats, Andy Klein, on October 22, 10:00 a.m. PT.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data webpage. You can download and use this data for free for your own purpose. All we ask are three things: 1) You cite Backblaze as the source if you use the data, 2) You accept that you are solely responsible for how you use the data, and 3) You do not sell this data to anyone—it is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

The post Backblaze Hard Drive Stats Q3 2020 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Enhanced Ransomware Protection: Announcing Data Immutability With Backblaze B2 and Veeam

2020-10-08 Natasha Rabinov

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/object-lock-data-immutability/

Protecting businesses and organizations from ransomware has become one of the most, if not the most, essential responsibilities for IT directors and CIOs. Ransomware attacks are on the rise, occuring every 14 seconds, but you likely already know that. That’s why a top requested feature for Backblaze’s S3 Compatible APIs is Veeam® immutability—to increase your organization’s protection from ransomware and malicious attacks.

We heard you and are happy to announce that Backblaze B2 Cloud Storage now supports data immutability for Veeam backups. It is available immediately.

The solution, which earned a Veeam Ready-Object with Immutability qualification, means a good, clean backup is just clicks away when reliable recovery is needed.

It is the only public cloud storage alternative to Amazon S3 to earn Veeam’s certifications for both compatibility and immutability. And it offers this at a fraction of the cost.

“I am happy to see Backblaze leading the way here as the first cloud storage vendor outside of AWS to give us this feature. It will hit our labs soon, and we’re eager to test this to be able to deploy it in production.”—Didier Van Hoye, Veeam Vanguard and Technology Strategist

Using Veeam Backup & Replication, you can now simply check a box and make recent backups immutable for a specified period of time. Once that option is selected, nobody can modify, encrypt, tamper with, or delete your protected data. Recovering from ransomware is as simple as restoring from your clean, safe backup.

Freedom From Tape, Wasted Resources, and Concern

Prevention is the most pragmatic ransomware protection to implement. Ensuring that backups are up-to-date, off-site, and protected with a 3-2-1 strategy is the industry standard for this approach. But up to now, this meant that IT directors who wanted to create truly air-gapped backups were often shuttling tapes off-site—adding time, the necessity for on-site infrastructure, and the risk of data loss in transit to the process.

With object lock functionality, there is no longer a need for tapes or a Veeam virtual tape library. You can now create virtual air-gapped backups directly in the capacity tier of a Scale-out Backup Repository (SOBR). In doing so, data is Write Once, Read Many (WORM) protected, meaning that even during the locked period, data can be restored on demand. Once the lock expires, data can safely be modified or deleted as needed.

Some organizations have already been using immutability with Veeam and Amazon S3, a storage option more complex and expensive than needed for their backups. Now, Backblaze B2’s affordable pricing and clean functionality mean that you can easily opt in to our service to save up to 75% off of your storage invoice. And with our Cloud to Cloud Migration offers, it’s easier than ever to achieve these savings.

In either scenario, there’s an opportunity to enhance data protection while freeing up financial and personnel resources for other projects.

Backblaze B2 customer Alex Acosta, Senior Security Engineer at Gladstone
Institutes—an independent life science research organization now focused on fighting COVID-19—explained that immutability can help his organization maintain healthy operations. “Immutability reduces the chance of data loss,” he noted, “so our researchers can focus on what they do best: transformative scientific research.”

Enabling Immutability

How to Set Object Lock:

Data immutability begins by creating a bucket that has object lock enabled. Then within your SOBR, you can simply check a box to make recent backups immutable and specify a period of time.

What Happens When Object Lock Is Set:

The true nature of immutability is to prevent modification, encryption, or deletion of protected data. As such, selecting object lock will ensure that no one can:

Manually remove backups from Capacity Tier.
Remove data using an alternate retention policy.
Remove data using lifecycle rules.
Remove data via tech support.
Remove by the “Remove deleted items data after” option in Veeam.

Once the lock period expires, data can be modified or deleted as needed.

Getting Started Today

With immutability set on critical data, administrators navigating a ransomware attack can quickly restore uninfected data from their immutable Backblaze backups, deploy them, and return to business as usual without painful interruption or expense.

Get started with improved ransomware protection today. If you already have Veeam, you can create a Backblaze B2 account to get started. It’s free, easy, and quick, and you can begin protecting your data right away.

The post Enhanced Ransomware Protection: Announcing Data Immutability With Backblaze B2 and Veeam appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How iconik Built a Multi-Cloud SaaS Solution

2020-10-06 Tim Child

Post Syndicated from Tim Child original https://www.backblaze.com/blog/how-iconik-built-a-multi-cloud-saas-solution/

This spotlight series calls attention to developers who are creating inspiring, innovative, and functional solutions with cloud storage. This month, we asked Tim Child, Co-founder of iconik, to explain the development of their cloud-based content management and collaboration solution.

How iconik Built a Multi-Cloud SaaS

The Challenge:

Back when we started designing iconik, we knew that we wanted to have a media management system that was hugely scalable, beyond anything our experienced team had seen before.

With a combined 50 years in the space, we had worked with many customer systems and not one of them was identical. Each customer had different demands for what systems should offer—whether it was storage, CPU, or database—and these demands changed over the lifecycle of the customer’s needs. Change was the only constant. And we knew that systems that couldn’t evolve and scale couldn’t keep up.

Identifying the Needs:

We quickly realized that we would need to meet constantly changing demands on individual parts of the system and that we needed to be able to scale up and down capabilities at a granular level. We wanted to have thousands of customers with each one potentially having hundreds of thousands, if not millions, of assets on the same instance, leading to the potential for billions of files being managed. We also wanted to have the flexibility to run private instances for customers if they so demanded.

With these needs in mind, we knew our service had to be architected and built to run in the cloud, and that we would run the business as a SaaS solution.

Mapping Our Architecture

Upon identifying this challenge, we settled on using a microservices architecture with each functional unit broken up and then run in Docker containers. This provided the granularity around functions that we knew customers would need. This current map of iconik’s architecture is nearly identical to what we planned from the start.

To manage these units while also providing for the scaling we sought, the architecture required an orchestration layer. We decided upon Kubernetes, as it was:

A proven technology with a large, influential community supporting it.
A well maintained open-source orchestration platform.
A system that functionally supported what we needed to do while also providing the ability to automatically scale, distribute, and handle faults for all of our containers.

During this development process, we also invested time in working with leading cloud IaaS and PaaS providers, in particular both Amazon AWS and Google Cloud, to discover the right solutions for production systems, AI, transcode, CDN, Cloud Functions, and compute.

Choosing a Multi-Cloud Approach

Based upon the learnings from working with a variety of cloud providers, we decided that our strategy would be to avoid being locked into any one cloud vendor, and instead pursue a multi-cloud approach—taking the best from each and using it to our customers’ advantage.

As we got closer to launching iconik.io in 2017, we started looking at where to run our production systems, and Google Cloud was clearly the winner in terms of their support for Kubernetes and their history with the project.

Looking at the larger picture, Google Cloud also had:

A world-class network with a presence of 93+ points in 64 global regions.
BigQuery, with its on-demand pricing, advanced scalability features, and ease of use.
Machine learning and AI tools that we had been involved in beta testing before they were built in, and which would provide an important element in our offering to give deep insights around media.
APIs that were rock solid.

These important factors became the deciding points on launching with Google Cloud. But, moving forward, we knew that our architecture would not be difficult to shift to another service if necessary as there was very little lock-in for these services. In fact, the flexibility provided allows us to run dedicated deployments for customers on their cloud platform of choice and even within their own virtual private cloud.

Offering Freedom of Choice for Storage

With our multi-cloud approach in mind, we wanted to bring the same flexibility we developed in production systems to our storage offering. Google Cloud Services was a natural choice because it was native to our production systems platform. From there, we grew options in line with the best fit for our customers, either based on their demands or based on what the vendor could offer.

From the start, we supported Amazon S3 and quickly brought Backblaze B2 Cloud Storage on board. We also allowed our customers to use their own Buckets to be truly in charge of their files. We continued to be led by the search for maximum scalability and flexibility to change on the fly.

While a number of iconik customers use B2 Cloud Storage or Amazon S3 as their only storage solution, many also take a multiple vendor approach because it can best meet their needs either in terms of risk management, delivery of files, or cost management.

Credit iconik, learn more in their Q2 2020 Media Stats report.

As we have grown, our multi-cloud approach has allowed us to onboard more services from Amazon—including AI, transcode, CDN, Cloud Functions, and compute for our own infrastructure. In the future, we intend to do the same with Azure and with IBM. We encourage the same for our customers as we allow them to mix and match AWS, Backblaze, GCS, IBM, and Microsoft Azure to match their strategy and needs.

Reaping the Benefits of a Multi-Cloud Solution

To date, our cloud-agnostic approach to building iconik has paid off.

This year, when iconik’s asset count increased by 293% to over 28M assets, there was no impact on performance.
As new technology has become available, we have been able to improve a single section of our architecture without impacting other parts.
By not limiting cloud services that can be used in iconik, we have been able to establish many rewarding partnerships and accommodate customers who want to keep the cloud services they already use.

Hopefully our story can help shed some light to help any others who are venturing out to build a SaaS of their own. We wish you luck!

The post How iconik Built a Multi-Cloud SaaS Solution appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Oslo by Streamlabs, Collaboration for Creatives

2020-10-01

Post Syndicated from original https://www.backblaze.com/blog/oslo-journey-to-market/

Oslo by Streamlabs, Collaboration for Creatives

With a mission of empowering creatives, the team at Streamlabs was driven to follow up their success in live streaming by looking beyond the stream—and so, Oslo was born.

Oslo, generally available as of today, is the place where solo YouTubers and small video editing teams can upload, review, collaborate, and share videos. But, the road from Streamlabs to Oslo wasn’t a straight line. The intrepid team from Streamlabs had to muddle through painfully truthful market research, culture shock, an affordability dilemma, and a pandemic to get Oslo into the hands of their customers. Let’s take a look at how they did it.

Market Research and the Road to Oslo

In September 2019, Streamlabs was acquired by Logitech. Yes, that Logitech, the one who makes millions of keyboards and mice, and all kinds of equipment for gamers. That Logitech acquired a streaming company. Bold, different, and yet it made sense to nearly everyone, especially anyone in the video gaming industry. Gamers rely on Logitech for a ton of hardware, and many of them rely on Streamlabs to stream their gameplay on Twitch, YouTube, and Facebook.

About the same time, Ashray Urs, Head of Product at Streamlabs, and his team were in the middle of performing market research and initial design work on their next product: video editing software for the masses. And what they were learning from the market research was disconcerting. While their target audience thought it would be awesome if Streamlabs built a video editor, the market was already full of them and nearly everybody already had one, or two, or even three editing tools on hand. In addition, the list of requirements to build a video editor was daunting, especially for Ashray and his small team of developers.

The future of Oslo was looking bleak when a fork in the road appeared. While video editing wasn’t a real pain point, many solo creators and small video editing teams were challenged and often overwhelmed by a key function in any project: collaboration. Many of these creators spent more time sending emails, uploading and downloading files, keeping track of versions and updates, and managing storage instead of being creative. Existing video collaboration tools were expensive, complex, and really meant for larger teams. Taking all this in, Ashray and his team decided on a different road to Oslo. They would build a highly affordable, yet powerful, video collaboration and sharing service.

Oslo collaboration view screenshot

Culture Shock: Hardware Versus Software

As the Oslo project moved forward, a different challenge emerged for Ashray: communicating their plans and processes for the Oslo service to their hardware oriented parent company, Logitech.

For example, each thought quite differently about the product release process. Oslo, as a SaaS service could, if desired, update their product daily to all their customers, and they could add new features and new upsells in weeks or maybe months. Logitech’s production process, on the other hand, was oriented toward having everything ready so they could make a million units of a keyboard. With the added challenge of not having an “update now” button on those keyboards.

Logitech was not ignorant of software, having created and shipped device drivers, software tools, and other utilities. But to them, the Oslo release process felt like a product launch on steroids. This is the part in the story where the bigger company tells the little company they have to do things “our” way. And it would have been stereotypically “corporate” for Logitech to say no to Oslo, then bury it in the backyard and move on. Instead, they gave the project the green light and fully supported Ashray and his team as they moved forward.

Oslo - New Channel - Daily Vlogs

Backblaze B2 Powers Affordability

As the feature requirements around Oslo began to coalesce, attention turned to how Oslo would deliver on the goal to provide them at an affordable price. After all, solo YouTubers and small video teams were not known to have piles of money to spend on tools. The question became moot when they chose Backblaze B2 Cloud Storage as their storage vendor.

To start, Backblaze enabled Oslo to meet the pricing targets they had determined were optimal for their market. Choosing any of the other leading cloud storage vendors would have doubled or even tripled the subscription price of Oslo. That would have made Oslo a non-starter for much of its target audience.

On the cost side, many of the other cloud storage providers have complex or hidden terms, like charging for files you delete if you don’t keep them around long enough—30 day minimum for some vendors, 90 day minimum for others. Ashray had no desire to explain to customers that they had to pay extra for deleted files, nor did he want to explain to his boss why 20% of the cloud storage costs for the Oslo service were for deleted files. With Backblaze he didn’t have to do either, as each day Oslo’s data storage charges are based on the files they currently have stored, and not for files they deleted 30, 60, or even 89 days ago.

On the features side, the Backblaze B2 Native APIs enabled Oslo to implement their special upload links feature which allows collaborators to add files directly into a specific project. As the project editor, this feature allows you to send upload links to collaborators that they can use to upload files. The links can be time-based—e.g. good for 24 hours—and password protected, if desired.

Travel Recap video image collage

New Product Development in a Pandemic

About the time the Oslo team was ready to start development, they were sent home as their office closed due to the Covid-19 pandemic. The whiteboards full of flow charts, UI diagrams, potential issues, and more essential information were locked away. Ad hoc discussions and decisions from hallway encounters, lunchroom conversations, and cups of tea with colleagues stopped.

The first few days were eerie and uncertain, but like many other technology companies they began to get used to their new work environment. Yes, they had the advantage of being technologically capable as meeting apps, collaboration services, and messaging systems were well within their grasp, but they were still human. While it took some time to get into the work from home groove, they were able to develop, QA, run a beta program, and deliver Oslo, without a single person stepping back in the office. Impressive.

Oslo 1.0

Every project, software, hardware, whatever, has some twists and turns as you go through the process. Oslo could have been just another video editing service, could have cost three times as much, or could have been one more cancelled project due to Covid-19. Instead, the Oslo team delivered YouTubers and the like an affordable video collaboration and sharing service with lots of cool features aimed at having them spend less time being project managers and more time being creators.

Nice job, we’re glad Backblaze could help. You can get the full scoop about Oslo at oslo.io.

The post Oslo by Streamlabs, Collaboration for Creatives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Simplifying Complex: A Multi-Cloud Approach to Scaling Production

2020-09-22 Lora Maslenitsyna

Post Syndicated from Lora Maslenitsyna original https://www.backblaze.com/blog/simplifying-complex-a-multi-cloud-approach-to-scaling-production/

How do you grow your production process without missing a beat as you evolve over 20 years from a single magazine to a multichannel media powerhouse? Since there are some cool learnings for many of you, here’s a summary of our recent case study deep dive into Verizon’s Complex Networks.

Founders Marc Eckō of Eckō Unlimited and Rich Antoniello started Complex in 2002 as a bi-monthly print magazine. Over almost 20 years, they’ve grown to produce nearly 50 episodic series in addition to monetizing more than 100 websites. They have a huge audience reaching 21 billion lifetime views and 52.2 million YouTube subscribers with premium distributors including Netflix, Hulu, Corus, Facebook, Snap, MSG, Fuse, Pluto TV, Roku, and more. Their team of creatives produce new content constantly—covering everything from music to movies, sports to video games, and fashion to food—which means that production workflows are the pulse of what they do.

Looking for Data Storage During Constant Production

In 2016, the Complex production team was expanding rapidly, with recent acquisitions bringing on multiple new groups that all had their own workflows. They used a Terrablock by Facilis and a few “homebrewed solutions,” but there was no unified, central storage location, and they were starting to run out of space. As many organizations with tons of data and no space do, they turned to Amazon Glacier.

There were problems:

Visibility: They started out with Glacier Vault, but with countless hours of good content, they constantly needed to access their archive—which required accessing the whole thing just to see what was in there.
Accessibility: An upgrade to S3 Glacier made their assets more visible, but retrieving those assets still involved multiple steps, various tools, and long retrieval times—sometimes ranging to 12 hours.
Complexity: S3 has multiple storage classes, each with its own associated costs, fees, and wait times.
Expense: The worst of the issue was that this glacial process didn’t just slow down production, it also incurred huge expenses through egress charges.

The worst thing was, staff would wade through this process only to sometimes realize that the content sent back to them wasn’t what they were looking for. The main issue for the team was that they struggled to see all of their storage systems clearly.

Organizing Storage With Transparent Asset Management

They resolved to fix the problem once and for all by investing in three areas:

Empower their team to collaborate and share at the speed of their work.
Identify tools that would scale with their team instantaneously.
Incorporate off-site storage that mimicked their on-site solutions’ scaling and simplicity.

To remedy their first issue, they set up a centralized SAN—a Quantum StorNext—that allowed the entire team to work on projects simultaneously.

Second, they found iconik, which moved them away from the inflexible on-prem integration philosophies of legacy MAM systems. Even better, they could test-run iconik before committing.

Finally, because iconik is integrated with Backblaze B2 Cloud Storage, the team at Complex decided to experiment with a B2 Bucket. Backblaze B2’s pay-as-you-go service with no upload fees, no deletion fees, and no minimum data size requirements fit the philosophy of their approach.

There was one problem: It was easy enough to point new projects toward Backblaze B2, but they still had petabytes of data they’d need to move to fully enable this new workflow.

Setting Up Active Archive Storage

The post and studio operations and media infrastructure and technology teams estimated that they would have to copy at least 550TB of 1.5PB of data from cold storage for future distribution purposes in 2020. Backblaze partners were able to help solve the problem.

Flexify.IO uses cloud internet connections to achieve significantly faster migrations for large data transfers. Pairing Flexify with a bare-metal cloud services platform to set up metadata ingest servers in the cloud, Complex was able to migrate to B2 Cloud Storage directly with their files and file structure intact. This allowed them to avoid the need to pull 550TB of assets into local storage just to ingest assets and make proxy files.

More Creative Possibilities With a Flexible Workflow

Now, Complex Networks is free to focus on creating new content with lightning-fast distribution. Their creative team can quickly access 550TB of archived content via proxies that are organized and scannable in iconik. They can retrieve entire projects and begin fresh production without any delays. “Hot Ones,” “Sneaker Shopping,” and “The Burger Show”—the content their customers like to consume, literally and figuratively, is flowing.

Is your business facing a similar challenge?

Read the case study to learn more.
Check out our Cloud to Cloud Migration offer and other transfer partners—we’ll pay for your data transfer if you need to move more than 50TB out of Amazon.

The post Simplifying Complex: A Multi-Cloud Approach to Scaling Production appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Path to S3 Compatible APIs: The Authentication Challenge

2020-09-17 Malay Shah

Post Syndicated from Malay Shah original https://www.backblaze.com/blog/the-path-to-s3-compatible-apis-the-authentication-challenge/

We launched our Backblaze S3 Compatible APIs in May of 2020 and released them for GA in July. After a launch, it’s easy to forget about the hard work that made it a reality. With that in mind, we’ve asked Malay Shah, our Senior Software Engineering Manager, to explain one of the challenges he found intriguing in the process. If you’re interested in developing your own APIs, or just curious about how ours have come to be, we think you’ll find Malay’s perspective interesting.

When we started building our Backblaze S3 Compatible APIs, we already had Backblaze B2 Cloud Storage, so the hard work to create a durable, scalable, and highly performant object store was already done. And B2 was already conceptually similar to S3, so the task seemed far from impossible. That’s not to say that it was easy or without any challenges. There were enough differences between the B2 Native APIs and the S3 API to make the project interesting, and one of those is authentication. In this post, I’m going to walk you through how we approached the challenge of authentication in our development of Backblaze S3 Compatible APIs.

The Challenge of Authentication: S3 vs. B2 Cloud Storage

B2 Cloud Storage’s approach to authentication is login/session based, where the API key ID and secret are used to log in and obtain a session ID, which is then provided on each subsequent request. S3 requires each individual request to be signed using the key ID and secret.

Our login/session approach does not require storing the API key secret on our end, only a hash of it. As a result, any compromise of our database would not allow hackers to impersonate customers and access their data. However, this approach is susceptible to “man-in-the-middle” attacks. Capturing the login request (API call to b2_authorize_account) would reveal the API key ID and secret to the attacker; capturing subsequent requests would reveal the session ID which is valid for 24 hours. Either of these would allow a hacker to impersonate a customer, which is clearly not a good thing. That said, our system and basic data safety practices will protect users. First, it is important to maintain your trusted certificate list. Our APIs are only available over HTTPS, and HTTPS in conjunction with a well managed trusted certificate list mitigates the likelihood of a “man-in-the-middle” attack.

Amazon’s approach with S3 requires their backend to store the secret because authenticating a request requires the backend to replicate the request signing process for each call. As a result, request signing is much less susceptible to a “man-in-the-middle” attack. The most any bad actor could do is replay the request; a hacker would not be able to impersonate the customer and make other requests. However, compromising the systems that store the API key secret would allow impersonation of the customer. This risk is typically mitigated by encrypting the API key secret and storing that key somewhere else, thus requiring multiple systems to be compromised.

Both approaches are common patterns for authentication, each with their own strengths and risks.

Storing the API Key Secret

To implement AWS’s request signing in our system, we first needed to figure out how to store the API key secret. A compromise of our database by a hacker who has obtained the hash of the secret for B2 does not allow that hacker to impersonate customers, but if we stored the secret itself, it absolutely would. So we couldn’t store the secret alongside the other application key data. We needed another solution, and it needed to handle the number of application keys we have (millions) and the volume of API requests we service (hundreds of thousands per minute), without slowing down requests or adding additional risks of failure.

Our solution is to encrypt the secret and store that alongside the other application key data in our database. The encryption key is then kept in a secrets management solution. The database already supports the volume of requests we service and decrypting the secret is computationally trivial, so there is no noticeable performance overhead.

With this approach, a compromise of the database alone would only reveal the encrypted version of the secret, which is just as useless as having the hash. Multiple systems must be compromised to obtain the API key secret.

Implementing the Request Signing Algorithm

We chose to only implement AWS’s Signature Version 4 as Version 2 is deprecated and is not allowed for use on newly created buckets. Within Version 4, there are multiple ways to sign the request: sign only the headers, sign the whole request, sign individual chunks, and pre-signed URLs. All of these follow a similar pattern but differ enough to warrant individual consideration for testing. We absolutely needed to get this right so we tested authentication in many ways:

Ran through Amazon’s test suite of example requests and expected signatures
Tested 20 applications that work with Backblaze S3 Compatible APIs including Veeam and Synology
Ran Ceph’s S3-tests suite
Manually tested using the AWS command line interface
Manually tested using Postman
Built automated tests using both the Python and Java SDKs
Made HTTP requests directly to test cases not possible through the Python or Java SDKs
Hired ~~hackers~~ security researchers to break our implementation

With the B2 Native API authentication model, we can verify authentication by examining the “Authorization” header and only then move on to processing the request, but S3 requests—where the whole request is signed or uses signed chunks—can only be verified after reading the entire request body. For most of the S3 APIs, this is not an issue. The request bodies can be read into memory, verified, and then continue on to processing. However, for file uploads, the request body can be as large as 5GB—far too much to store in memory—so we reworked our uploading logic to handle authentication failures occurring at the end of the upload and to only record API usage after authentication passes.

The different ways to sign requests meant that in some cases we have to verify the request after the headers arrive, and in other cases verify only after the entire request body is read. We wrote the signature verification algorithm to handle each of these request types. Amazon had published a test suite (which is now no longer available, unfortunately) for request signing. This test suite was designed to help people call into the Amazon APIs, but due to the symmetric nature of the request signing process, we were able to use it as well to test our server-side implementation. This was not an authoritative or comprehensive test suite, but it was a very helpful starting point. As was the AWS command line interface, which in debug mode will output the intermediate calculations to generate the signature, namely the canonical request and string to sign.

However, when we built our APIs on top of the signature validation logic, we discovered that our APIs handled reading the request body in different ways, leading to some APIs succeeding without verifying the request, yikes! So there were even more combinations that we needed to test, and not all of these combinations could be tested using the AWS software development kits (SDKs).

For file uploads, the SDKs only signed the headers and not the request body—a reasonable choice for file uploads. But as implementers, we must support all legal requests so we made direct HTTP requests to verify whole request signing and signed chunk requests. There’s also instrumentation now to ensure that all successful requests are verified.

Looking Back

We expected this to be a big job, and it was. Testing all the corner cases of request authentication was the biggest challenge. There was no single approach that covered everything; all of the above items tested different aspects of authentication. Having a comprehensive and multifaceted testing plan allowed us to find and fix issues we would have never thought of, and ultimately gave us confidence in our implementation.

The post The Path to S3 Compatible APIs: The Authentication Challenge appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Increasing Thread Count: Useful in Sheets and Cloud Storage Speed

2020-08-27 Troy Liljedahl

Post Syndicated from Troy Liljedahl original https://www.backblaze.com/blog/increasing-thread-count-useful-in-sheets-and-cloud-storage-speed/

As the Solutions Engineering Manager, I have the privilege of getting to work with amazing customers day in and day out to help them find a solution that fits their needs. For Backblaze B2 Cloud Storage, this includes helping them find an application to manage their media—like iconik, or setting up their existing infrastructure with our Backblaze S3 Compatible APIs, or even providing support for developers writing to our B2 Native APIs.

But regardless of the solution, one of the most common questions I get when talking with these customers is, “How do I maximize my performance to Backblaze??” People want to go fast. And the answer almost always comes down to: threads.

What Are Threads?

If you do not know what a thread is used for besides sewing, you’re not alone. First of all, threads go by many different names. Different applications may refer to them as streams, concurrent threads, parallel threads, concurrent upload, multi-threading, concurrency, parallelism, and likely some other names I haven’t come across yet.

But what all these terms refer to when we’re discussing B2 Cloud Storage is the process of uploading files. When you begin to transmit files to Backblaze B2, they are being communicated by threads. (If you’re dying for an academic description of threads, feel free to take some time with this post we wrote about them). Multithreading, not surprisingly, is the ability to upload multiple files (or multiple parts of one file) at the same time. It won’t shock you to hear that many threads are faster than one thread. The good news is that B2 Cloud Storage is built from the ground up to take advantage of multithreading—it is able to take as many threads as you can throw at it for no additional charge and your performance should scale accordingly. But it does not automatically do so, for reasons we’ll discuss right now.

Fine-tuning

Of course, this begs the question, why not just turn everything up to one million threads for UNLIMITED POWER!!!!????

Well, chances are your device can’t handle or take advantage of that many threads. The more threads you have open, the more taxing it will be on your device and your network, so it often takes some trial and error to find the sweet spot to get optimal performance without severely affecting the usability of your device.

Try adding more threads and see how the performance changes after you’ve uploaded for a while. If you see improvements in the upload rate and don’t see any performance issues with your device, then try adding some more and repeating the process. It might take a couple of tries to figure out the optimal number of threads for your specific environment. You’ll be able to rest assured that your data is moving at optimal power (not quite as intoxicating as unlimited power, but you’ll thank me when your computer doesn’t just give up).

How To Increase Your Thread Count

Some applications will take the guesswork out of this process and set the number of threads automatically (like our Backblaze Personal Backup and Backblaze Business Backup client does for users) while others will use one thread unless you say otherwise. Each application that works with B2 Cloud Storage treats threads a little differently. So we’ve included a few examples of how to adjust the number of threads in the most popular applications that work with B2 Cloud Storage—including Veeam, rclone, and SyncBackPro.

If you’re struggling with slow uploads in any of the many other integrations we support, check out our knowledge base to see if we offer a guide on how to adjust the threading. You can also reach out to our support team 24/7 via email for assistance in finding out just how to thread your way to the ultimate performance with B2 Cloud Storage.

Veeam

This one is easy—Veeam automatically uses up to 64 threads per VM (not to be confused with “concurrent tasks”) when uploading to Backblaze B2. To increase threading you’ll need to use per-VM backup files. You’ll find Veeam-recommended settings in the Advanced Settings of the Performance Tier in the Scale-out Repository. (See screenshot below).

Rclone

Rclone allows you to use the --transfers flag to adjust the number of threads up from the default of four. Rclone’s developer team has found that their optimal setting was --transfers 32, but every configuration is going to be different so you may find that another number will work better for you.

rclone sync /Users/Troy/Downloads b2:troydemorclone/downloads/ --transfers 20

Tip: If you like to watch and see how fast each file is uploading, use the --progress (or -P) flag and you’ll see the speeds of each upload thread!

SyncBackPro

SyncBackPro is an awesome sync tool for Windows that supports Backblaze B2 as well the ability to only sync deltas of a file (the parts that have changed). SyncBackPro uses threads in quite a few places across its settings, but the part that concerns how many concurrent threads will upload to Backblaze B2 is in the “Number of upload/download threads to use” setting. You can find this in the Cloud Settings under the Advanced tab. You’ll notice they even throw in a warning letting you know that too many will degrade performance!

Happy Threading!

I hope this guide makes working with B2 Cloud Storage a little faster and easier for you. If you’re able to make these integrations work for your use case, or you’ve already got your threading perfectly calibrated, we’d love to hear about your experience and learnings in the comments.

The post Increasing Thread Count: Useful in Sheets and Cloud Storage Speed appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.