Tag Archives: Cloud Storage

Cloud Storage Pricing: What You Need to Know

2022-06-30 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/cloud-storage-pricing-what-you-need-to-know/

Between tech layoffs and recession fears, economic uncertainty is at a high. If you’re battening down the hatches for whatever comes next, you might be taking a closer look at your cloud spend. Even before the bear market, 59% of cloud decision makers named “optimizing existing use of cloud (cost savings)” as their top cloud initiative of 2022 according to the Flexera State of the Cloud report.

Cloud storage is one piece of your cloud infrastructure puzzle, but it’s one where some simple considerations can save you anywhere from 25% up to 80%. As such, understanding cloud storage pricing is critical when you are comparing different solutions. When you understand pricing, you can better decide which provider is right for your organization.

In this post, we won’t look at 1:1 comparisons of cloud storage pricing, but you can check out a price calculator here. Instead, you will learn tips to help you make a good cloud storage decision for your organization.

Evaluating Your Cloud Storage? Gather These Facts

Looking at the pricing options of different cloud providers only makes sense when you know your needs. Use the following considerations to clarify your storage needs to approach a cloud decision thoughtfully:

How do you plan to use cloud storage?
How much does cloud storage cost?
What features are offered?

1. How Do You Plan to Use Cloud Storage?

Some popular use cases for cloud storage include:

Backup and archive.
Origin storage.
Migrating away from LTO/tape.
Managing a media workflow.

Backup and Archive

Maintaining data backups helps make your company more resilient. You can more easily recover from a disaster and keep serving customers. The cloud provides a reliable, off-site place to keep backups of your company workstations, servers, NAS devices, and Kubernetes environments.

Case Study: Famed Photographer Stores a Lifetime of Work

Photographer Steve McCurry, renowned for his 1984 photo of the “Afghan Girl” which has been on the cover of National Geographic several times, backed up his life’s work in the cloud when his team didn’t want to take chances with his irreplaceable archives.

Origin Storage

If you run a website, video streaming service, or online gaming community, you can use the cloud to serve as your origin store where you keep content to be served out to your users.

Case Study: Serving 1M+ Websites From Cloud Storage

Big Cartel hosts more than one million e-commerce websites. To increase resilience, the company recently started using a second cloud provider. By adopting a multi-cloud infrastructure, the business now has lower costs and less risk of failure.

Migrating Away From LTO/Tape

Managing a tape library can be time-consuming and comes with high CapEx spending. With inflation, replacing tapes costs more, shipping tapes off-site costs more, and physical storage space costs more. The cloud provides an affordable alternative to storing data on tape where you pass the decreased margins off to a cloud provider—they have to worry about provisioning enough physical storage devices and space while you pay as you go.

Managing Media Workflow

Your department or organization may need to work with large media files to create movies or digital videos. Cloud storage provides an alternative to provisioning huge on-premises servers to handle large files.

Case Study: Using the Cloud to Store Media

Hagerty Insurance stored a huge library of video assets on an aging server that couldn’t keep up. They implemented a hybrid cloud solution for cloud backup and sync, saving the team over 200 hours per year searching for files and waiting for their slow server to respond.

2. How Much Does Cloud Storage Cost?

Cloud storage costs are calculated in a variety of different ways. Before considering any specific vendors, knowing the most common options, variables, and fees is helpful, including:

Flat or single-tier pricing vs. tiered pricing.
Hot vs. cold storage.
Storage location.
Minimum retention periods.
Egress fees.

Flat or Single-tier Pricing vs. Tiered Pricing

A flat or single-tier pricing approach charges the user based on the storage volume, and cost is typically expressed per gigabyte stored. There is only one tier, making budgeting and planning for cloud expenses simple.

On the other hand, some cloud storage services use a tiered storage pricing model. For example, a provider may have a small business pricing tier and an enterprise tier. Note that different pricing tiers may include different services and features. Today, your business might use an entry-level pricing tier but need to move to a higher-priced tier as you produce more data.

Hot vs. Cold Storage

Hot storage is helpful for data that needs to be accessible immediately (e.g., last month’s customer records). By contrast, cold storage is helpful for data that does not need to be accessed quickly (e.g., tax records from five years ago). For more insight on hot vs. cold storage, check out our post: “What’s the Diff: Hot and Cold Data Storage.” Generally speaking, cold storage is the cheapest, but that low price comes at the cost of speed. For data that needs to be accessed frequently or even for data where you’re not sure how often you need access, hot storage is better.

Storage Location

Some organizations need their cloud storage to be located in the same country or region due to regulations or just preference. But some storage vendors charge different prices to store data in different regions. Keeping data in a specific location may impact cloud storage prices.

Minimum Retention Periods

Most folks think of “retention” as a good thing, but some storage vendors enforce minimum retention periods that essentially impose penalties for deleting your data. Some vendors enforce minimum retention periods of 30, 60, or even 90 days. Deleting your data could cost you a lot, especially if you have a backup approach where you retire older backups before the retention period ends.

Egress Fees

Cloud companies charge egress fees when customers want to move their data out of the provider’s platform. These fees can be egregiously high, making it expensive for customers to use multi-cloud infrastructures and therefore locking customers into their services.

3. What Additional Features Are Offered?

While price is likely one of your biggest considerations, choosing a cloud storage provider solely based on price can lead to disappointment. There are specific cloud storage features that can make a big difference in your productivity, security, and convenience. Keep these features and capabilities in mind when comparing different cloud storage solutions.

Security Features

You may be placing highly sensitive data like financial records and customer service data in the cloud, so features like server-side encryption could be important. In addition, you might look for a provider that offers Object Lock so you can protect data using a Write Once, Read Many (WORM) model.

Data Speed

Find out how quickly the cloud storage provider can provide data regarding upload and download speed. Keep in mind that the speed of your internet connection also impacts how fast you can access data. Data speed is critically important in several industries, including media and live streaming.

Customer Support

If your company has a data storage problem outside of regular business hours, customer support becomes critically important. What level of support can you expect from the provider? Do they offer expanded support tiers?

Partner Integrations

Partner integrations make it easier to manage your data. Check if the cloud storage provider has integrations with services you already use.

The Next Step in Choosing Cloud Storage

Understanding cloud storage pricing requires a holistic view. First, you need to understand your organization’s data needs. Second, it is wise to understand the typical cloud storage pricing models commonly used in the industry. Finally, cloud storage pricing needs to be understood in the context of features like security, integrations, and customer service. Once you consider these steps, you can approach a decision to switch cloud providers or optimize your cloud spend more rigorously and methodically.

The post Cloud Storage Pricing: What You Need to Know appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Fortune Favors the Backup: How One Media Brand Protected High-profile Video Footage

2022-06-15 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/fortune-favors-the-backup-how-one-media-brand-protected-high-profile-video-footage/

Leading business media brand, Fortune, has amassed hundreds of thousands of hours of footage capturing conference recordings, executive interviews, panel discussions, and more showcasing some of the world’s most high-profile business leaders over the years. It’s the jewel in their content crown, and there are no second chances when it comes to capturing those moments. If any of those videos were to be lost or damaged, they’d be gone forever, with potential financial consequences to boot.

At the same time, Fortune’s distributed team of video editors needs regular and reliable access to that footage for use on the company’s sites, social media channels, and third-party web properties. So when Fortune divested from their parent company Meredith Corporation in 2018, revising its tech infrastructure was a priority.

Becoming an independent enterprise gave Fortune the freedom to escape legacy limitations and pop the cork on bottlenecks that were slowing productivity and raking up expenses. But their first attempt at a solution was expensive, unreliable, and difficult to use—until they migrated to Backblaze B2 Cloud Storage. Jeff Billark, Head of IT Infrastructure for Fortune Media Group, shared how it all went down.

Not Quite Camera-ready: An Overly Complex Tech Stack

Working with systems integrator CHESA, Fortune used a physical storage device to seed data to the cloud. They then built a tech stack that included:

An on-premises server housing Primestream Xchange media asset management (MAM) software for editing, tagging, and categorization.
Archive management software to handle backups and long-term archiving.
Cold object storage from one of the diversified cloud providers to hold backups and archive data.

But it didn’t take long for the gears to gum up. The MAM system couldn’t process the huge quantity of data in the archive they’d seeded to the cloud, so unprocessed footage stayed buried in cold storage. To access a video, Fortune editors had to work with the IT department to find the file, thaw it, and save it somewhere accessible. And the archiving software wasn’t reliable or robust enough to handle Fortune’s file volume; it indicated that video files had been archived without ever actually writing them to the cloud.

Time for a Close-up: Simplifying the Archive Process

If they hadn’t identified the issue quickly, Fortune could have lost 100TB of active project data. That’s when CHESA suggested Fortune simplify its tech stack by migrating from the diversified cloud provider to Backblaze B2. Two key tools allowed Fortune to eliminate archiving middleware by making the move:

Thanks to Primestream’s new Backblaze data connector, Backblaze integrated seamlessly with the MAM system, allowing them to write files directly to the cloud.
They implemented Panic’s Transmit tool to allow editors to access the archives themselves.

Backblaze’s Universal Data Migration program sealed the deal by eliminating the transfer and egress fees typically associated with a major data migration. Fortune transferred over 300TB of data in less than a week with zero downtime, business disruption, or egress costs.

For Fortune, the most important benefits of migrating to Backblaze B2 were:

Increasing reliability around both archiving and downloading video files.
Minimizing need for IT support with a system that’s easy to use and manage.
Unlocking self-service options within a modern digital tech experience.

“Backblaze really speeds up the archive process because data no longer has to be broken up into virtual tape blocks and sequences. It can flow directly into Backblaze B2.”
—Jeff Billark, Head of IT Infrastructure, Fortune Media Group

Unlocking Hundreds of Thousands of Hours of Searchable, Accessible Footage

Fortune’s video editing team now has access to two Backblaze B2 buckets that they can access without any additional IT support:

Bucket #1: 100TB of active video projects.
When any of the team’s video editors needs to find and manipulate footage that’s already been ingested into Primestream, it’s easy to locate the right file and kick off a streamlined workflow that leads to polished, new video content.

Bucket #2: 300TB of historical video files.
Using Panic’s Transmit tool, editors sync data between their Mac laptops and Backblaze B2 and can easily search historical footage that has not yet been ingested into Primestream. Once files have been ingested and manipulated, editors can upload the results back to Bucket #1 for sharing, collaboration, and storage purposes.

With Backblaze B2, Fortune’s approach to file management is simple and reliable. The risk of archiving failures and lost files is greatly reduced, and self-service workflows empower editors to collaborate and be productive without IT interruptions. Fortune also reduced storage and egress costs by about two-thirds, all while accelerating its content pipeline and maximizing the potential of its huge and powerful video archive.

“Backblaze is so simple to use, our editors can manage the entire file transfer and archiving process themselves.”
—Jeff Billark, Head of IT Infrastructure, Fortune Media Group

The post Fortune Favors the Backup: How One Media Brand Protected High-profile Video Footage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze and Carahsoft Help Public Sector CIOs Optimize Cloud Spend

2022-06-14 Elton Carneiro

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/backblaze-and-carahsoft-help-public-sector-cios-optimize-cloud-spend/

If you’re in charge of IT for a public sector entity, you know the budgeting and procurement process doesn’t lend itself well to buying cloud services. But, today, the life of a public sector CIO just got a whole lot easier. Through a new partnership with Carahsoft, public sector customers can now leverage their existing state, local, and federal buying programs to access Backblaze B2 Cloud Storage.

We’re not the only cloud storage provider available through Carahsoft, the Master Government Aggregator for the IT industry, but we are the easy, affordable, trusted solution among providers in their ecosystem. Read on to learn more about the partnership.

The Right Cloud Solution at the Right Time

For state and local governments, federal agencies, healthcare providers, and higher education institutions, the pandemic introduced challenges that required cloud scalability—remote work and increased demand for public services, to name two. But due to procurement procedures and budgeting incompatibility, adopting the cloud isn’t always a smooth process for the public sector.

The public sector typically uses a CapEx model to budget for IT services. The cloud’s pay-as-you-go pricing model can be at odds with this budgeting method. Public sector CIOs are also typically required to use established buying programs to purchase services, which many cloud providers are not a part of.

Further, recent research shows that while public sector cloud adoption has increased, a “budget snapback” driven by return to office IT expenses is prompting CIOs in this field to optimize their cloud spend. Public sector institutions are seeking additional value in their cloud budgets, and clamoring for a way to purchase those services through existing programs and channels.

“Public sector decision-makers reference budget, pricing models, and transparency as their biggest barriers to cloud adoption. That’s why this partnership is so exciting: Our services come at a fraction of the price of other options, and we’ve long been known for our transparent, trusted approach to working with customers.”
—Nilay Patel, VP of Sales, Backblaze

Bringing Capacity-based Cloud Services to the Public Sector

Backblaze, through the partnership with Carahsoft—which was enabled by our recent launch of a capacity-based pricing bundle, Backblaze B2 Reserve—solves both the budgeting and procurement challenges public sector CIOs are facing.

The partnership brings Backblaze services to state, local, and federal buying programs in a model they prefer at a fraction of the price of traditional cloud storage providers. It’s an affordable, easy solution for public sector CIOs seeking to optimize cloud spend in the wake of the pandemic.

“Backblaze’s ease of use, affordability, and transparency are just some of the major advantages of their robust cloud backup and storage services. We look forward to working with Backblaze and our reseller partners to help agencies better protect and secure their business data.”
—Evan Slack, Director of Sales for Emerging Cloud and Virtualization Technologies, Carahsoft

About Carahsoft

Carahsoft Technology Corp. is The Trusted Government IT Solutions Provider®, supporting public sector organizations across federal, state, and local government agencies and education and healthcare markets. As the Master Government Aggregator® for vendor partners, Carahsoft delivers solutions for cybersecurity, multi-cloud, DevSecOps, big data, artificial intelligence, open-source, customer experience, and more. Working with resellers, systems integrators, and consultants, Carahsoft’s sales and marketing teams provide industry leading IT products, services, and training through hundreds of contract vehicles.

About Backblaze B2 Reserve

Backblaze B2 Reserve packages cloud storage in a capacity-based bundle with an annualized SKU which works seamlessly with channel billing models. The offering also provides seller incentives, Tera-grade support, and expanded migration services to empower the channel’s acceleration of cloud storage adoption and revenue growth. Customers can purchase Backblaze B2 through channel partners, starting at 20TB.

A Public Sector Case Study: Kings County Modernizes With Backblaze B2 Cloud Storage

With a looming bill to replace aging tapes and an out-of-warranty tape drive, the Kings County IT department modernized their IT infrastructure by moving to the cloud for backups. With help from Backblaze, Kings County natively tiered backups from their preferred backup software to Backblaze B2 Cloud Storage, enabling them to implement incremental backups, reduce their overall IT footprint and costs, and save about 150 hours of staff time per year.

Read the full case study here.

How to Get Started With Backblaze B2 and Carahsoft

For resellers interested in offering Backblaze services, it is business as usual if you currently have an account with Carahsoft. Those with immediate quote requests should email partnerships@backblaze.com for further details. For any resellers who do not have an account with Carahsoft and would like the ability to sell Backblaze services, follow this link to create a Carahsoft account.

The post Backblaze and Carahsoft Help Public Sector CIOs Optimize Cloud Spend appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Double Redundancy, Support Compliance, and More With Cloud Replication: Now Live

2022-06-07 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/double-redundancy-support-compliance-and-more-with-cloud-replication-now-live/

Cloning is a little bit creepy (Seriously, you can clone your pet now?), but having clones of your data is far from it—creating and storing redundant copies is essential when it comes to protecting your business, complying with regulations, or developing apps. With Backblaze Cloud Replication—now generally available—you can get set up in just a few clicks to automatically copy data across buckets, accounts, or regions.

Unbox Backblaze Cloud Replication

Join us for a webinar to unbox all the capabilities of Cloud Replication on July 13, 2022 at 10 a.m. PDT with Sam Lu, Product Manager at Backblaze.

Existing customers can start using Cloud Replication immediately by clicking on Cloud Replication within their Backblaze account or via the Backblaze B2 Native API.

Simply click on Cloud Replication in your account to get started.

Not a Backblaze customer yet? Sign up here. And read on for more details on how this feature can benefit you.

What Is Backblaze Cloud Replication?

Backblaze Cloud Replication is a new service that allows customers to automatically store to different locations—across regions, across accounts, or in different buckets within the same account. You can set replication rules in a few easy steps.

Once the rules are set on a given bucket, any data uploaded to that bucket will automatically be replicated into the destination bucket you choose.

What Is Cloud Replication Good For?

There are three main reasons you might want to use Cloud Replication:

Data Redundancy: Replicating data for security, compliance, and continuity purposes.
Data Proximity: Bringing data closer to distant teams or customers for faster access.
Replication Between Environments: Replicating data between testing, staging, and production environments when developing applications.

Data Redundancy

Keeping redundant copies of your data is the most common use case for Cloud Replication. Enterprises with comprehensive backup strategies, especially as they are increasingly cloud-based, will likely find Cloud Replication immediately applicable. It can help businesses:

Recover quickly from natural disasters and cybersecurity threats.
Support modern business continuity.
Reduce the risk of data loss and downtime.
Comply with industry or board regulations centered on concentration risk issues.
Meet data residency requirements stemming from regulations like GDPR.

Data redundancy has always been a best practice—the gold standard for backup strategies has long been a 3-2-1 approach. The core principles of 3-2-1—keeping at least three copies of your data, on two different media, with one copy off-site—were originally developed for an on-premises world. They still hold true, and today they are being applied in even more robust ways to an increasingly cloud-based world.

Backblaze’s Cloud Replication helps businesses apply the principles of 3-2-1 within a cloud-first or cloud-dominant infrastructure. By storing to multiple regions and/or multiple buckets in the same region, businesses virtually achieve an “off-site” backup—easily and automatically protecting data from natural disasters, political instability, or even run-of-the-mill compliance headaches.

Data Proximity

If you have teams, customers, or workflows spread around the world, bringing a copy of your data closer to where work gets done can minimize speed-of-light limitations. Especially for media-heavy teams in industries like game development and postproduction, seconds can make the difference in keeping creative teams operating smoothly. And because you can automate replication and use metadata to track accuracy and process, you can remove some manual steps from the process where errors and data loss tend to crop up.

Replication Between Environments

Version control and smoke testing are nothing new, but when you’re controlling versions of large applications or trying to keep track of what’s live and what’s in testing, you might need a tool with more horsepower and options for customization. Backblaze Cloud Replication can serve these needs.

You can easily replicate objects between buckets dedicated for production, testing, or staging if you need to use the same data and maintain the same metadata. This allows you to observe best practices and automate replication between environments.

Want to Learn More About Backblaze Cloud Replication?

Join the webinar on July 13, 2022 at 10 a.m. PDT.
Here’s a walk-through of Cloud Replication, including step-by-step instructions for using Cloud Replication via the web UI and the Backblaze B2 Native API.
Access documentation here.
Check out our Help articles on how to create rules here.

If you’re a new customer, click here to sign up for Backblaze B2 Cloud Storage and learn more about Cloud Replication.

The post Double Redundancy, Support Compliance, and More With Cloud Replication: Now Live appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2

2022-06-02 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/optimize-your-media-production-workflow-with-iconik-lucidlink-and-backblaze-b2/

In late April, thousands of professionals from all corners of the media, entertainment, and technology ecosystem assembled in Las Vegas for the National Association of Broadcasters trade show, better known as the NAB Show. We were delighted to sponsor NAB after its two year hiatus due to COVID-19. Our staff came in blazing hot and ready to hit the tradeshow floor.

One of the stars of the 2022 event was Backblaze partner LucidLink, named a Cloud Computing and Storage category winner in the NAB Show Product of the Year Awards. In this blog post, I’ll explain how to combine LucidLink’s Filespaces product with Backblaze B2 Cloud Storage and media asset management from iconik, another Backblaze partner, to optimize your media production workflow. But first, some context…

How iconik, LucidLink, and Backblaze B2 Fit in a Media Storage Architecture

The media and entertainment industry has always been a natural fit for Backblaze. Some of our first Backblaze Computer Backup customers were creative professionals looking to protect their work, and the launch of Backblaze B2 opened up new options for archiving, backing up, and distributing media assets.

As the media and entertainment industry moved to 4K Ultra HD for digital video recording over the past few years, file sizes ballooned. An hour of high quality 4K video shot at 60 frames per second can require up to one terabyte of storage. Backblaze B2 matches well with today’s media and entertainment storage demands, as customers such as Fortune Media, Complex Networks, and Alton Brown of “Good Eats” fame have discovered.

Alongside Backblaze B2, an ecosystem of tools has emerged to help professionals manage their media assets, including iconik and LucidLink. iconik’s cloud-native media management and collaboration solution gathers and organizes media securely from a wide range of locations, including Backblaze B2. iconik can scan and index content from a Backblaze B2 bucket, creating an asset for each file. An iconik asset can combine a lower resolution proxy with a link to the original full-resolution file in Backblaze B2. For a large part of the process, the production team can work quickly and easily with these proxy files, previewing and selecting clips and editing them into a sequence.

Complementing iconik and B2 Cloud Storage, LucidLink provides a high-performance, cloud-native, network-attached storage (NAS) solution that allows professionals to collaborate on files stored in the cloud almost as if the files were on their local machine. With LucidLink, a production team can work with multi-terabyte 4K resolution video files, making final edits and rendering the finished product at full resolution.

It’s important to understand that the video editing process is non-destructive. The original video files are immutable—they are never altered during the production process. As the production team “edits” a sequence, they are actually creating a series of transformations that are applied to the original videos as the final product is rendered.

You can think of B2 Cloud Storage and LucidLink as tiers in a media storage architecture. Backblaze B2 excels at cost-effective, durable storage of full-resolution video assets through their entire lifetime from acquisition to archive, while LucidLink shines during the later stages of the production process, from when the team transitions to working with the original full-resolution files to the final rendering of the sequence for release.

iconik brings B2 Cloud Storage and LucidLink together; not only can an iconik asset include a proxy and links to copies of the original video in both B2 Cloud Storage and LucidLink, iconik Storage Gateway can copy the original file from Backblaze B2 to LucidLink when full-resolution work commences, and later delete the LucidLink copy at the end of the production process, leaving the original archived in Backblaze B2. All that’s missing is a little orchestration.

The Backblaze B2 Storage Plugin for iconik

The Backblaze B2 Storage Plugin for iconik allows creative professionals to copy files from B2 Cloud Storage to LucidLink, and later delete them from LucidLink, in a couple of mouse clicks. The plugin adds a pair of custom actions to iconik: “Add to LucidLink” and “Remove from LucidLink,” applicable to one or many assets or collections, accessible from the Search page and the Asset/Collection page. You can see them on the lower right of this screenshot:

The user experience could hardly be simpler, but there is a lot going on under the covers.

There are several components involved:

The plugin, deployed as a serverless function. The initial version of the plugin is written in Python for deployment on Google Cloud Functions, but it could easily be adapted for other serverless cloud platforms.
A LucidLink Filespace.
A machine with both the LucidLink client and iconik Storage Gateway installed. The iconik Storage Gateway accesses the LucidLink Filespace as if it were local file storage.
iconik, accessed both by the user via its web interface and by the plugin via the iconik API. iconik is configured with two iconik “storages”, one for Backblaze B2 and one for the iconik Storage Gateway instance.

When the user selects the “Add to LucidLink” custom action, iconik sends an HTTP request, containing the list of selected entities, to the plugin. The plugin calls the iconik API with a request to copy those entities from Backblaze B2 to the iconik Storage Gateway. The gateway writes the files to the LucidLink Filespace, exactly as if it were writing to the local disk, and the LucidLink client sends the files to LucidLink. Now the full-resolution files are available for the production team to access in the Filespace, while the originals remain in B2 Cloud Storage.

Later, when the user selects the “Remove from LucidLink” custom action, iconik sends another HTTP request containing the list of selected entities to the plugin. This time, the plugin has more work to do. Collections can contain other collections as well as assets, so the plugin must access each collection in turn, calling the iconik API for each file in the collection to request that it be deleted from the iconik Storage Gateway. The gateway simply deletes each file from the Filespace, and the LucidLink client relays those operations to LucidLink. Now the files are no longer stored in the Filespace, but the originals remain in B2 Cloud Storage, safely archived for future use.

This short video shows the plugin in action, and walks through the flow in a little more detail:

Deploying the Backblaze B2 Storage Plugin for iconik

The plugin is available open-source under the MIT license at https://github.com/backblaze-b2-samples/b2-iconik-plugin. Full deployment instructions are included in the plugin’s README file.

Don’t have a Backblaze B2 account? You can get started here, and the first 10GB are on us. We can also set up larger scale trials involving terabytes of storage—enter your details and we’ll get back to you right away.

Customize the Plugin to Your Requirements

You can use the plugin as is, or modify it to your requirements. For example, the plugin is written to be deployed on Google Cloud Functions, but you could adapt it to another serverless cloud platform. Please report any issues with the plugin via the issues tab in the GitHub repository, and feel free to submit contributions via pull requests.

The post Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Looking Forward to Backblaze Cloud Replication: Everything You Need to Know

2022-05-26 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/looking-forward-to-backblaze-cloud-replication-everything-you-need-to-know/

Backblaze Cloud Replication—currently in private beta—enables Backblaze customers to store files in multiple regions, or create multiple copies of files in one region, across the Backblaze Storage Cloud. This capability, as we explained in an earlier blog post, allows you to create geographically separate copies of data for compliance and continuity, keep data closer to its consumers, or maintain a live copy of production data for testing and staging. Today we’ll look at how you can get started with Cloud Replication, so you’ll be ready for its release, likely early next month.

Backblaze Cloud Replication: The Basics

Backblaze B2 Cloud Storage organizes data into files (equivalent to Amazon S3’s objects) in buckets. Very simply, Cloud Replication allows you to create rules that control replication of files from a source bucket to a destination bucket. The source and destination buckets can be in the same or different accounts, or in the same or different regions.

Here’s a simple example: Suppose I want to replicate files from my-production-bucket to my-staging-bucket in the same account, so I can run acceptance tests on an application with real-life data. Using either the Backblaze web interface or the B2 Native API, I would simply create a Cloud Replication rule specifying the source and destination buckets in my account. Let’s walk through a couple of examples in each interface.

Cloud Replication via the Web Interface

Log in to the account containing the source bucket for your replication rule. Note that the account must have a payment method configured to participate in replication. Cloud Replication will be accessible via a new item in the B2 Cloud Storage menu on the left of the web interface:

Clicking Cloud Replication opens a new page in the web interface:

Click Replicate Your Data to create a new replication rule:

Configuring Replication Within the Same Account

To implement the simple rule, “replicate files from my-production-bucket to my-staging-bucket in the same account,” all you need to do is select the source bucket, set the destination region the same as the source region, and select or create the destination bucket:

Configuring Replication to a Different Account

To replicate data via the web interface to a different account, you must be able to log in to the destination account. Click Authenticate an existing account to log in. Note that the destination account must be enabled for Backblaze B2 and, again, must have a payment method configured:

After authenticating, you must select a bucket in the destination account. The process is the same whether the destination account is in the same or a different region:

Note that, currently, you may configure a bucket as a source in a maximum of two replication rules. A bucket can be configured as a destination in any number of rules.

Once you’ve created the rule, it is accessible via the web interface. You can pause a running rule, run a paused rule, or delete the rule altogether:

Replicating Data

Once you have created the replication rule, you can manipulate files in the source bucket as you normally would. By default, existing files in the source bucket will be copied to the destination bucket. New files, and new versions of existing files, in the source bucket will be replicated regardless of whether they are created via the Backblaze S3 Compatible API, the B2 Native API, or the Backblaze web interface. Note that the replication engine runs on a distributed system, so the time to complete replication is based on the number of other replication jobs scheduled, the number of files to replicate, and the size of the files to replicate.

Checking Replication Status

Click on a source or destination file in the web interface to see its details page. The file’s replication status is at the bottom of the list of attributes:

There are four possible values of replication status:

pending: The file is in the process of being replicated. If there are two rules, at least one of the rules is processing. (Reminder: Currently, you may configure a bucket as a source in a maximum of two replication rules.) Check again later to see if it has left this status.
completed: This status represents a successful replication. If two rules are configured, both rules have completed successfully.
failed: A non-recoverable error has occurred, such as insufficient permissions to write the file into the destination bucket. The system will not try again to process this file. If two rules are configured, at least one has failed.
replica: This file was created by the replication process. Note that replica files cannot be used as the source for further replication.

Cloud Replication and Application Keys

There’s one more detail to examine in the web interface before we move on to the API. Creating a replication rule creates up to two Application Keys; one with read permissions for the source bucket, if the source bucket is not already associated with an Application Key, and one with write permissions for the destination bucket.

The keys are visible in the App Keys page of the web interface:

You don’t need to worry about these keys if you are using the web interface, but it is useful to see how the pieces fit together if you are planning to go on to use the B2 Native API to configure Cloud Replication.

This short video walks you through setting up Cloud Replication in the web interface:

Cloud Replication via the B2 Native API

Configuring cloud replication in the web interface is quick and easy for a single rule, but quickly becomes burdensome if you have to set up multiple replication rules. The B2 Native API allows you to programmatically create replication rules, enabling automation and providing access to two features not currently accessible via the web interface: setting a prefix to constrain the set of files to be replicated and excluding existing files from the replication rule.

Configuring Replication

To create a replication rule, you must include replicationConfiguration when you call b2_create_bucket or b2_update_bucket. The source bucket’s replicationConfiguration must contain asReplicationSource, and the destination bucket’s replicationConfiguration must contain asReplicationDestination. Note that both can be present where a given bucket is the source in one replication rule and the destination in another.

Let’s illustrate the process with a concrete example. Let’s say you want to replicate newly created files with the prefix master_data/, and new versions of those files, from a bucket in the U.S. West region to one in the EU Central region so that you have geographically separate copies of that data. You don’t want to replicate any files that already exist in the source bucket.

Assuming the buckets already exist, you would first create a pair of Application Keys: one in the source account, with read permissions for the source bucket, and another in the destination account, with write permissions for the destination bucket.

Next, call b2_update_bucket with the following message body to configure the source bucket:

{
    "accountId": "<source account id/>",
    "bucketId": "<source bucket id/>",
    "replicationConfiguration": {
        "asReplicationSource": {
            "replicationRules": [
                {
                    "destinationBucketId": "<destination bucket id>",
                    "fileNamePrefix": "master_data/",
                    "includeExistingFiles": false,
                    "isEnabled": true,
                    "priority": 1,
                    "replicationRuleName": "replicate-master-data"
                }
            ],
            "sourceApplicationKeyId": "<source application key id/>"
        }
    }
}

Finally, call b2_update_bucket with the following message body to configure the destination bucket:

{
  "accountId": "<destination account id>",
  "bucketId": "<destination bucket id>",
  "replicationConfiguration": {
    "asReplicationDestination": {
      "sourceToDestinationKeyMapping": {
        "<source application key id/>": "<destination application key id>"
      }
    },
    "asReplicationSource": null
  }
}

You can check your work in the web interface:

Note that the “file prefix” and “include existing buckets” configuration is not currently visible in the web interface.

Viewing Replication Rules

If you are planning to use the B2 Native API to set up replication rules, it’s a good idea to experiment with the web interface first and then call b2_list_buckets to examine the replicationConfiguration property.

Here’s an extract of the configuration of a bucket that is both a source and destination:

{
  "accountId": "e92db1923dce",
  "bucketId": "2e2982ddebf12932830d0c1e",
  ...
  "replicationConfiguration": {
    "isClientAuthorizedToRead": true,
    "value": {
      "asReplicationDestination": {
        "sourceToDestinationKeyMapping": {
          "000437047f876700000000005": "003e92db1923dce0000000004"
        }
      },
      "asReplicationSource": {
        "replicationRules": [
          {
            "destinationBucketId": "0463b7a0a467fff877f60710",
            "fileNamePrefix": "",
            "includeExistingFiles": true,
            "isEnabled": true,
            "priority": 1,
            "replicationRuleName": "replication-eu-to-us"
          }
        ],
        "sourceApplicationKeyId": "003e92db1923dce0000000003"
      }
    }
  },
  ...
}

Checking a File’s Replication Status

To see the replication status of a file, including whether the file is itself a replica, call b2_get_file_info and examine the replicationStatus field. For example, looking at the same file as in the web interface section above:

{
  ...
  "bucketId": "548377d0a467fff877f60710",
  ...
  "fileId": "4_z548377d0a467fff877f60710_f115587450d2c8336_d20220406_
m162741_c000_v0001066_t0046_u01649262461427",
  ...
  "fileName": "Logo Slide.png",
  ...
  "replicationStatus": "completed",
  ...
}

This short video runs through the various API calls:

How Much Will This Cost?

The majority of fees for Cloud Replication are identical to standard B2 Cloud Storage billing: You pay for the total data you store, replication (download) fees, and for any related transaction fees. For details regarding billing, click here.

The replication fee is only incurred between cross-regional accounts. For example, a source in the U.S. West and a destination in EU Central would incur replication fees, which are priced identically to our standard download fee. If the replication rule is created within a region—for example, both source and destination are located in our U.S. West region—there is no replication fee.

How to Start Replicating

Watch the Backblaze Blog for an announcement when we make Backblaze Cloud Replication generally available (GA), likely early next month. As mentioned above, you will need to set up a payment method on accounts included in replication rules. If you don’t yet have a Backblaze B2 account, or you need to set up a Backblaze B2 account in a different region from your existing account, sign up here and remember to select the region from the dropdown before hitting “Sign Up for Backblaze B2.”

The post Looking Forward to Backblaze Cloud Replication: Everything You Need to Know appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The Python GIL: Past, Present, and Future

2022-05-24 Backblaze

Post Syndicated from Backblaze original https://www.backblaze.com/blog/the-python-gil-past-present-and-future/

Our team had some fun experimenting with Python 3.9-nogil, the results of which will be reported in an upcoming blog post. In the meantime, we saw an opportunity to dive deeper into the history of the global interpreter lock (GIL), including why it makes Python so easy to integrate with and the tradeoff between ease and performance.

We reached out to Barry Warsaw, a preeminent Python developer and contributor, because we could think of no one better to break down the evolution of the GIL for us. Barry is a longtime Python core developer, former release manager and steering council member, and PSF Fellow. He was project lead for the GNU Mailman mailing list manager. Barry, along with contributor Paweł Polewicz, a backend software developer and longtime Python user, went above and beyond anything we could have imagined, developing this comprehensive deep dive into the GIL and its evolution over the years. Thanks also go to Larry Hastings for his review and feedback.

If Python’s GIL is something you are curious about, we’d love to hear your thoughts in the comments. We’ll let Barry take it from here.

—The Editors

First Things First: What Is the GIL?

The Python GIL, or Global Interpreter Lock, is a mechanism in CPython (the most common implementation of Python) that serves to serialize operations involving the Python bytecode interpreter, and provides useful safety guarantees for internal object and interpreter state. While providing many benefits, as the discussion below will show, the GIL also prevents CPython from achieving full multicore performance.

In simplest terms, the GIL is a lock (or mutex) that allows only a single operating system thread to run the central Python bytecode interpreter loop. Normally, when multiple threads can access shared state, such as global interpreter or object internal state, a programmer would need to implement fine grained locks to prevent one thread from stomping on the state set by another thread. The GIL removes the need for these fine grained locks because it imposes a global lock that prevents multiple threads from mutating this state at the same time.

In this post, I’ll explore the pros and cons of the GIL, and the many efforts over the years to remove it, including some recent exciting developments.

Humble Beginnings

Back in November 1994, I was invited to a little gathering of programming language enthusiasts to meet the Dutch inventor of a relatively new and little known object-oriented language. This three day workshop was organized by my friends and former colleagues at the National Institute of Standards and Technology (NIST) in Gaithersburg, MD. I came with extensive experience in languages from C, C++, FORTH, LISP, Perl, TCL, and Objective-C and enjoyed learning and playing with new programming languages.

Of course, the Dutch inventor was Guido van Rossum and his little language was Python. I think most of us in attendance knew there was something special about Python and Guido, but it probably would have shocked us to know that Python would even be around almost 30 years later, let alone have the scope, impact, or popularity it enjoys today. For me personally, it was a life-changing moment.

A few years ago, I gave a talk at BayPiggies that took a retrospective look at the evolution of Python from version 1.1 in October 1994 (just before the abovementioned workshop), through the Python 2 series, and up to Python 3.7, the newest release of the language at the time. In many ways, Python 1.1 would be recognizable by today’s modern Python programmer. In other ways, you’d wonder how Python was ever usable without features that were introduced in the intervening years.

Can you imagine not having the tuple() or list() built-ins, or docstrings, or class exceptions, keyword arguments, *args, **kws, packages, or even different operators for assignment and equality tests? It was fun to go back through all those old changelogs and remember what it was like as each of the features we now take for granted were introduced, often in those early days with absolutely no regard for backward compatibility.

I managed to find the agenda for that first Python workshop, and one of the items to be discussed was “Improving the efficiency of Python (e.g., by using a different garbage collection scheme).” I don’t remember any of the details of that discussion, but even then, and from its start, Python employed a reference counting memory management scheme (the cyclic garbage detector being many years away yet). Reference counting is a simple way of managing your objects in a higher level language where you don’t directly allocate or free your memory. One of Guido’s early guiding principles for Python, and which has served Python well over the years, is to keep it as simple as possible while still being effective, useful, and fun.

The Basics of Reference Counting

Reference counting is simple; as it says on the tin, the interpreter keeps a counter that tracks every reference to an object. For example, binding an object to a variable (such as by an assignment) increases that object’s reference count by one. Appending an object to a list also increases its reference count by one. Removing an object from the list decreases that object’s reference count by one. When a variable goes out of scope, the reference count of the object the variable is bound to is decreased by one again. We call this reference count the object’s “refcount” and these two operations “incref” and “decref” respectively.

When an object’s refcount goes to zero it means there are no more live references to the object, so it can be safely freed (and finalized) because nothing in the program can reach that object anymore¹. As these objects are deallocated, any references to objects they hold are also decref’d, and so on. Refcounting gives the Python interpreter a very simple mechanism for freeing garbage and more importantly, it allows for humans to reason about Python’s memory management, both from the point of view of the Python programmer, and from the vantage point of the C extension writer, who doesn’t have the luxury of all that reference counting happening automatically.

This is a crucial point: When we talk about “Python” we generally mean “CPython,” the implementation of the runtime written in C². The C programmer working on the CPython runtime, and the module author writing extensions for Python in C (for performance or to integrate with some system library) does have to worry about all the nitty gritty details of when to incref or decref an object. Get this wrong and your extension can leak memory or double free an object, either way wreaking havoc on your system. Fortunately, Python has clear rules to follow and good documentation, but it can still be difficult to get refcounting right in complex situations, such as when proper error handling leads to multiple exit paths from a function.

Here’s Where the GIL Comes In: Reference Counting and Concurrency

One of the key simplifying rules is that the programmer doesn’t have to worry about concurrency when managing Python reference counting. Think about the situation where you have multiple threads, each inserting and removing a Python object from a collection such as a list or dictionary. Because those threads may run at any time and in any order, you would normally have to be extremely defensive in how you incref and decref those objects, and it would be way too easy to get this wrong. You could crash Python, or worse, if you didn’t implement the proper locks around your incref and decref operations. Having to worry about all that would make your C code very complicated and likely pretty error prone. The CPython implementation also has global and static variables which are vulnerable to race conditions³.

In keeping with Python’s principles, in 1992, when Guido first began to implement threading support in Python, he utilized a simple mechanism to keep this manageable for a wide range of Python programmers and extension authors: a Global Interpreter Lock—the infamous GIL!

Because the Python interpreter itself is not thread-safe, the GIL allows only one thread to execute Python bytecode at a time, and thus serializes all access to Python objects. So, barring bugs, it is impossible for multiple threads to stomp on each other’s reference count operations. There are C API functions to release and acquire the GIL around blocking I/O or compute intensive functions that don’t touch Python objects, and these provide boundaries for the interpreter to switch to other Python-executing threads.

Two threads incrementing an object reference counter.

Thus, we gain significant C implementation simplicity at the expense of some parallelism. Modern Python has many ways to work around this limitation, from asyncio to subprocesses and multiprocessing, which all work fine if they align with your requirements. Python also surfaces operating system threading primitives, but these can’t take full advantage of multicore operations because of the GIL.

Advantages of the GIL

Back in the early days of Python, we didn’t have the prevalence of multicore processors, so this all worked fine. These days, modern programming languages are more multicore friendly, and the GIL gets a bad rap. Before we explore the work to remove the GIL, it’s important to understand just how much benefit and mileage Python has gotten out of it.

One important aspect of the GIL is that it simplifies the programming model for extension module authors. When writing extension modules in C, C++, or any other low-level language with access to the internals of the Python interpreter, extension authors would normally have to ensure that there are no race conditions that could corrupt the internal state of Python objects. Concurrency is hard to get right, especially so in low-level languages, and one mistake can corrupt the entire state of the interpreter⁴. For an extension author, it can already be challenging to ensure all your increfs and decrefs are properly balanced, especially for any branches, early exits, or error conditions, and this would be monumentally more difficult if the author also had to contend with concurrent execution. The GIL provides an important simplifying model of object access (including refcount manipulation) because it ensures that only one thread of execution can mutate Python objects at a time⁵.

There are important performance benefits of the GIL for single-threaded operations as well. Without the GIL, Python would need some other way of ensuring that object refcounts are safe from corruption due to, for example, race conditions between threads, such as when adding or removing objects from any mutable collection (lists, dictionaries, sets) that are shared across threads. These techniques can be very expensive as some of the experiments described later showed. Ensuring that Python interpreter is safe for multithreaded use cases degrades its performance for the single-threaded use case. The GIL’s low performance overhead really shines for single-threaded operations, including I/O-multiplexed programs where libraries like asyncio are used, and this is still a predominant use of Python. Finer-grained locks also increase the chances of deadlocks, which isn’t possible with the GIL.

Also, one of the reasons Python is so popular today is that it had so many extensions written for it over the years. One of the reasons there are so many powerful extension modules, whether we like to admit it or not, is that the GIL makes those extensions easier to write.

And yet, Python programmers have long dreamed of being able to run multithreaded Python programs to take full advantage of all the cores available on modern computing platforms. Even today’s watches and phones have multiple cores, whereas in Python’s early days, multicore systems were rare. Here we are 30 or so years later, and while the GIL has served Python well, in order to take advantage of what clearly seems to be more than a passing fad, Python’s GIL often gets in the way of true high-performance multithreaded concurrency.

Attempting to Remove the GIL

Two threads incrementing object reference counter without GIL protection.

Over the years, many attempts have been made to remove the GIL.

1999: Greg Stein’s “Free Threading”

Circa 1999, Greg Stein’s “free threading” work was one of the first (successful!) attempts to remove the GIL. It made the locks much more fine-grained and moved global variables inside the interpreter into a structure, which we actually still use today. It had the unfortunate side effect however, of making your Python code multiple times slower. Thus, while the free threading work was a great experiment, it was far too impractical to adopt.

2015: Larry Hasting’s Gilectomy

Years later (circa 2015), Larry Hasting’s wonderfully named Gilectomy project tried a different approach to remove the GIL. In Larry’s PyCon 2016 talk, he discusses four technical considerations that must be addressed when removing the GIL:

Reference Counting: Race conditions on updating the refcount between multiple threads as described previously.
Globals and Statics: These include interpreter global housekeeping variables, and shared singleton objects. Much work has been done over the years to move these globals into per-thread structures. Eric Snow’s work on multiple interpreters (aka “subinterpreters”) has also made a lot of progress on isolating these variables into structures that represent an interpreter “instance” where theoretically each instance could run on a separate core. There are even proposals for making some of those shared singleton objects immortal, such that reference counting race conditions would have no effect on the lifetime of those objects. An interesting related proposal would move the GIL into a per-interpreter data structure, which could lead to the ability to run an isolated interpreter instance per core (with limitations).
C Extensions: Keep in mind that there is a huge ecosystem of C extension modules, and much of Python’s power comes from these extension modules, of which NumPy is a hugely popular example. These extensions have never had to worry about parallelism or re-entrancy because they’ve always relied on the GIL to serialize their operations. At a minimum, a GIL-less Python will require recompilation of extension modules, and some or all may require some level of source code modifications as well. These changes may include protecting internal (non-Python) data structures for concurrency, using functional APIs for refcount modification instead of accessing refcount fields directly, not assuming that Python collections are stable over iteration, etc.
Atomicity: Operations such as adding or deleting objects from Python collections such as lists and dictionaries actually involve a number of steps internally. To the Python developer, these all appear to be atomic operations, and in fact they are, thanks to the GIL.

Larry also identifies what he calls three “political” considerations, but which I think are more in the realm of the social contract between Python developers and Python users:

Removing the GIL should not hurt performance for single-threaded or I/O-bound multithreaded code.
We can’t break existing C extensions as described above⁶.
Don’t let GIL removal make the CPython interpreter too complicated or difficult to understand. One of Guido’s guiding principles, and a subtle reason for Python’s huge success, is that even with complicated features such as exception handling, asyncio, generators, etc. Python’s C core is still relatively easy to learn and understand. This makes it easy for new contributors to engage with Python core development, an absolutely essential quality if you want your language to thrive and grow for its next 30 years as much as it has for its previous 30.

Larry’s Gilectomy work is quite impressive, and I highly recommend watching any of his PyCon talks for deep technical dives, served with a healthy dose of humor. As Larry points out, removing the GIL isn’t actually the hard part. The hard part is doing so while adhering to the above mentioned technical and social constraints, retaining Python’s single-threaded performance, and building a mechanism that scales with the number of cores. This latter constraint is important because if we’re going to enable multicore operations, we want to ensure that Python’s performance doesn’t hit a plateau at four or eight cores.

So, why did the Gilectomy branch fail (measured in units of “didn’t get adopted by CPython”)? For the most part, the performance and complexity constraints couldn’t be met. One of the biggest hits on performance wasn’t actually lock contention on objects. The early Gilectomy work relied on atomic increment and decrement CPU instructions, which destroyed cache consistency, and caused a high overhead of communication on the intercore bus to ensure atomicity.

Intercore atomic incr/decr communication.

Later, Larry experimented with a technique borrowed from garbage collection research called “buffered reference counting,” essentially a transaction log for refcount changes. However, contention on transaction logs required further modifications to segregate logs by threads and by increment and decrement operations. This led to non-realtime garbage collection events on refcounts reaching zero, which broke features such as Python’s weakref objects.

Interestingly, another hotspot turned out to be what’s called “obmalloc,” which is a small block allocator that improves performance over just using system malloc for everything. We’ll touch on this again later. Solving all these knock-on effects (such as repairing the cyclic garbage collector) led to increased complexity of the implementation, making the chance that it would ever get merged into Python highly unlikely.

Before we leave this topic to look at some new and exciting work, let’s return briefly to Eric Snow’s work on multiple interpreters (aka subinterpreters). PEP 554 proposes to add a new standard library module called “interpreters” which would expose the underlying work that Eric has been doing to isolate interpreter state out of global variables internal to CPython. One such global state is, of course, the GIL. With or without Python-level access to these features, if the GIL could be moved from global state to per-interpreter state, each interpreter instance could theoretically run concurrently with the others. You could therefore attach a different interpreter instance to each thread, and these could run Python code in parallel. This is definitely a work in progress and it’s unclear whether multiple interpreters will deliver on its promises of this kind of limited concurrency. I say “limited” because without full GIL removal, there is significant complexity in sharing Python objects between interpreters, which would almost certainly be necessary. Issues such as ownership (which thread owns which object) and safe mutability would need to be resolved. PEP 554 proposes some solutions to these problems and more, so we’ll have to keep an eye on this work. But even multiple interpreters don’t provide the same true concurrency that full GIL removal promises.

The Future of the GIL: Where Do We Go From Here?

And now we come full-circle, because Python’s popularity, vast influence, and reach is also one of the reasons why it still seems impossible to remove the GIL while retaining single-threaded performance and not breaking the entire ecosystem of extension modules.

Yet here we are with PyCon 2022 just concluded, and there is renewed excitement for Sam Gross’ “nogil” work, which holds the promise of a performant, GIL-less CPython with minimal backward incompatibilities at both the Python and C layers. While some performance regressions are inevitable, Sam’s work also utilizes a number of clever techniques to claw these regressions back through other internal performance improvements.

Two threads incrementing object reference counter on Sam Gross’ “nogil” branch.

With these improvements as well as the work that Guido’s team at Microsoft is doing with its Faster CPython project, there is renewed hope and excitement that the GIL can be removed while retaining or even improving overall performance, and not giving up on backward compatibility. It will clearly be a multi-year effort.

Sam’s nogil project aims to support a concurrency sweet spot. It promises that data race conditions will never corrupt Python’s virtual machine, but it leaves the integrity of user-level data structures to the programmer. Concurrency is hard, and many Python programs and libraries benefit from the implicit GIL constraints, but solving this is a harder problem outside the scope of the nogil project. Data science applications are one big potential domain to benefit from true multiprocessor enabled concurrency in Python.

There are a number of techniques that the nogil project utilizes to remove the GIL bottleneck. As mentioned, the project also employs a number of other virtual machine improvements to regain some of the performance inevitably lost by removing the GIL. I won’t go into too much detail about these improvements, but it’s helpful to note that where these are independent of nogil, they can and are being investigated along with other work Guido’s team is doing to improve the overall performance of CPython.

Python 3.11 recently entered beta (and thus feature freeze), and with it we’ll see significant performance improvements, which no doubt will continue in future Python releases. When and if nogil is adopted, some of those performance gains may regress to support nogil. Whether and how this will be a good trade-off will be an interesting point of analysis and debate in the coming years. In Sam’s original paper, he proposes a runtime switch to choose between nogil and normal GIL operation, however this was discussed at the PyCon 2022 Language Summit, and the consensus was that this wouldn’t be practical. Thus, as the nogil experiment moves forward, it will be enabled by a compile-time switch.

At a high level, the removal of the GIL is afforded by changes in three areas: the memory allocator, reference counting, and concurrent collection protections. Each of these are deep topics on their own, so we’ll only be able to touch on them briefly.

nogil Part 1: Memory Allocators

Because everything in Python is an object, and most objects are dynamically allocated on the heap, the CPython interpreter implements several levels of memory allocators, and provides C API functions for allocating and freeing memory. This allows it to efficiently allocate blocks of raw memory from the operating system, and to subdivide and manage those blocks based on the type of objects being placed into them. For example, integers have different memory requirements than dictionaries, so having object-specific memory managers for these (and other) types of objects makes memory management inside the interpreter much more efficient.

CPython also employs a small object allocator, called pymalloc, which improves performance for allocating and freeing objects smaller than or equal to 512 bytes. This only touches on the complexities of memory management inside the interpreter. The point of all this complexity is to enable more efficient object creation and destruction, but it also allows for features like memory allocation debugging and custom memory allocators.

The nogil works takes advantage of this pluggability to utilize a general purpose, highly efficient, thread-safe memory allocator developed by Daan Leijen at Microsoft called mimalloc. mimalloc itself is worthy of an in-depth look, but for our purposes it’s enough to know that the mimalloc design is extremely well tuned to efficient and thread-safe allocation of memory blocks. The nogil project utilizes these structures for the implementation of dictionaries and other collection types which minimize the need for locks on non-mutating access, as well as managing garbage collected objects⁷ with minimal bookkeeping. mimalloc has also been highly tuned for performance and thread-safety.

nogil Part 2: Reference Counting

nogil also makes several changes to reference counting, although it does so in a clever way that minimizes changes to the Limited C API, but does not preserve the stable ABI. This means that while extension modules must be recompiled, their source code may not require modification, outside of a few known corner cases⁸.

One very promising idea is to make some objects effectively immortal, which I touched on earlier. True, False, None and some other objects in practice never actually see their refcounts go to zero, and so they stay alive for the entire lifetime of the Python process. By utilizing the least significant bits of the object’s reference count field for bookkeeping, nogil can make the refcounting macros no-op for these objects, thus avoiding all contention across threads for these fields.

nogil uses a form of biased reference counting to split an object’s refcount into two buckets. For refcount changes in the thread that owns the object, these “local” changes can be made by the more efficient conventional (non-atomic) forms. For changing the refcount of objects in a different thread, an atomic operation is necessary for safe concurrent modification of a “shared” refcount. The thread that owns the object can then combine this local and shared refcount for garbage collection purposes, and it can give up ownership when its local refcount goes to zero. This is performant when most object accesses are local to the owning thread, which is generally the case. nogil’s biased reference counting scheme can utilize mimalloc’s memory pools to efficiently keep track of the owning threads.

However, some objects are typically owned by multiple threads and are not immortal, and for these types of objects (e.g., functions, modules), a deferred reference counting scheme is employed. Incref and decref act as normal for these objects, but when the interpreter loads these objects onto its internal stack, the refcounts are not modified. The utility of this technique is limited to objects that are only deallocated during garbage collection because they are typically involved in reference cycles.

The garbage collector is also modified to ensure that it only runs at safe boundary points, such as a bytecode execution boundary. The current nogil implementation of garbage collection is single-threaded and stops the world, so it is thread-safe. It repurposes some of the existing C API functions to ensure that it doesn’t wait on threads that are blocked on I/O.

nogil Part 3: Concurrent Collection Protections

The third high-level technique that nogil uses to enable concurrency is to implement an efficient algorithm for locking container objects, such as dictionaries and lists, when mutating them. To maintain thread-safety, there’s just no way around employing locks for this. However, nogil optimizes for objects that are primarily modified in a single thread, and it admits that objects which are frequently and concurrently modified may need a different design.

Sam’s nogil paper goes into considerable detail about the locking algorithm, but at a high level it relies on container versioning (where every modification to a container bumps a “version” counter so the various read accesses can know whether the container has been modified between distinct reads or not), biased reference counting, and various mimalloc features to optimize for fast track, single-threaded, no modification reads while amortizing the cost of locking for writes against the other expensive operations a typical container write operation imposes.

The Last Word and Some Predictions

Sam Gross’ nogil project is impressive. He’s managed to satisfy most of the difficult constraints that have thwarted previous attempts at removing the GIL, including minimizing as much as possible the impact on single-threaded performance (and trading general interpreter performance improvements for the cost of removing the GIL), maintaining (mostly) Python’s C API backward compatibility to not force changes on the entire extension module ecosystem, and all the while (Despite the length of this article!) preserving the readability and comprehensibility of the CPython interpreter.

You’ve no doubt noticed that the rabbit hole goes pretty deep, and we’ve only explored some of the tunnels in this particular burrow. Fortunately, Python’s semantics and CPython’s implementation has been well documented over its 30 year life, so there are plenty of opportunities for self-exploration…and contributions! It will take sustained engagement through careful and incremental steps to bring these ideas to fruition. The future certainly is exciting.

If I had to guess, I would say that we’ll see features like multiple interpreters provide some concurrency value in the next release or so, with GIL removal five years (and thus five releases) or more away. However many of the techniques described here are already being experimented with and may show up earlier. Python 3.11 will have many noticeable performance improvements, with plenty of room for additional performance work in future releases. These will give the nogil work room to continue its experimentation at true multicore performance.

For a language and interpreter that has gone from a small group of lucky and prescient enthusiasts to a worldwide top-tier programming language, I think there is more excitement and optimism for Python’s future than ever. And that’s not even talking about game changers such as PyScript.

Stay tuned for a post that introduces the performance experiments the Backblaze team has done with Python 3.9-nogil and Backblaze B2 Cloud Storage. Have you experimented with Python 3.9-nogil? Let us know in the comments.

Barry Warsaw

Barry has been a Python core developer since 1994 and is listed as the first non-Dutch contributor to Python. He worked with Python’s inventor, Guido van Rossum, at CNRI when Guido, and Python development, moved from the Netherlands to the USA. He has been a Python release manager and steering council member, created and named the Python Enhancement (PEP) process, and is involved in Python development to this day. He was the project leader for GNU Mailman, and for a while maintained Jython, the implementation of Python built on the JVM. He is currently a senior staff engineer at LinkedIn, a semiprofessional bass player, and tai chi enthusiast. All opinions and commentary expressed in this article are his own.

Get in touch with Barry:

Paweł Polewicz

Pawel has been a backend developer since 2002. He built the largest e-radio station on the planet in 2006-2007, worked as a QA manager for six years, and finally, started Reef Technologies, a software house highly specialized in building Python backends for startups.

Get in touch with Paweł:

Notes

Reference cycles are not only possible but surprisingly common, and these can keep graphs of unreachable objects alive indefinitely. Python 2.0 added a generational cyclic garbage collector to handle these cases. The details are tricky and worthy of an article in its own right.
CPython is also called the “reference implementation” because new features show up there first, even though they are defined for the generic “Python language.” It’s also the most popular implementation, and typically what people think of when they say “Python.”
Much work has been done over the years to reduce these as much as possible.
It’s even worse than this implies. Debugging concurrency problems is notoriously difficult because the conditions that lead to the bug are nearly impossible to reproduce, and few tools exist to help.
Instrumenting concurrent code to try to capture the behavior can introduce subtle timing differences that hide the problem. The industry has even coined the term, “Heisenbug,” to describe the complexity of this class of bug.
Some extension modules also use the GIL as a conveniently available mutex to protect concurrent access to their own, non-Python resources.
It doesn’t seem possible to completely satisfy this constraint in any attempt to remove the GIL.
I.e., the aforementioned cyclic reference garbage collector.
Such as when the extension module peeks and pokes inside CPython data structures directly or via various macros, instead of using the C API’s functional interfaces.

The post The Python GIL: Past, Present, and Future appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Go Serverless with Rising Cloud and Backblaze B2

2022-05-18 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/go-serverless-with-rising-cloud-and-backblaze-b2/

Go Serverless With Rising Cloud and Backblaze B2

In my last blog post, I explained how to use a Cloudflare Worker to send notifications on Backblaze B2 events. That post focused on how a Worker could proxy requests to Backblaze B2 Cloud Storage, sending a notification to a webhook at Pipedream that logged each request to a Google Spreadsheet.

Developers integrating applications and solutions with Backblaze B2 can use the same technique to solve a wide variety of use cases. As an example, in this blog post, I’ll explain how you can use that same Cloudflare Worker to trigger a serverless function at our partner Rising Cloud that automatically creates thumbnails as images are uploaded to a Backblaze B2 bucket, without incurring any egress fees for retrieving the full-size images.

What is Rising Cloud?

Rising Cloud hosts customer applications on a cloud platform that it describes as Intelligent-Workloads-as-a-Service. You package your application as a Linux executable or a Docker-style container, and Rising Cloud provisions instances as your application receives HTTP requests. If you’re familiar with AWS Lambda, Rising Cloud satisfies the same set of use cases while providing more intelligent auto-scaling, greater flexibility in application packaging, multi-cloud resiliency, and lower cost.

Rising Cloud’s platform uses artificial intelligence to predict when your application is expected to receive heavy traffic volumes and scales up server resources by provisioning new instances of your application in advance of when they are needed. Similarly, when your traffic is low, Rising Cloud spins down resources.

So far, so good, but, as we all know, artificial intelligence is not perfect. What happens when Rising Cloud’s algorithm predicts a rise in traffic and provisions new instances, but that traffic doesn’t arrive? Well, Rising Cloud picks up the tab—you only pay for the resources your application actually uses.

As is common with most cloud platforms, Rising Cloud applications must be stateless—that is, they cannot themselves maintain state from one request to the next. If your application needs to maintain state, you have to bring your own data store. Our use case, creating image thumbnails, is a perfect match for this model. Each thumbnail creation is a self-contained operation and has no effect on any other task.

Creating Image Thumbnails on Demand

As I explained in the previous post, the Cloudflare Worker will send a notification to a configured webhook URL for each operation that it proxies to Backblaze B2 via the Backblaze S3 Compatible API. That notification contains JSON-formatted metadata regarding the bucket, file, and operation. For example, on an image download, the notification looks like this:

{
    "contentLength": 3015523,
    "contentType": "image/png",
    "method": "GET",
    "signatureTimestamp": "20220224T193204Z",
    "status": 200,
    "url": "https://s3.us-west-001.backblazeb2.com/my-bucket/image001.png"
}

If the metadata indicates an image upload (i.e. the method is PUT, the content type starts with image, and so on), the Rising Cloud app will retrieve the full-size image from the Backblaze B2 bucket, create a thumbnail image, and write that image back to the same bucket, modifying the filename to distinguish it from the original.

Here’s the message flow between the user’s app, the Cloudflare Worker, Backblaze B2, and the Rising Cloud app:

A user uploads an image in a Backblaze B2 client application.
The client app creates a signed upload request, exactly as it would for Backblaze B2, but sends it to the Cloudflare Worker rather than directly to Backblaze B2.
The Worker validates the client’s signature and creates its own signed request.
The Worker sends the signed request to Backblaze B2.
Backblaze B2 validates the signature and processes the upload.
Backblaze B2 returns the response to the Worker.
The Worker forwards the response to the client app.
The Worker sends a notification to the Rising Cloud Web Service.
The Web Service downloads the image from Backblaze B2.
The Web Service creates a thumbnail for the image.
The Web Service uploads the thumbnail to Backblaze B2.

These steps are illustrated in the diagram below.

I decided to write the application in JavaScript, since the Node.js runtime environment and its Express web application framework are well-suited to handling HTTP requests. Also, the open-source Sharp Node.js module performs this type of image processing task 4x-5x faster than either ImageMagick or GraphicsMagick. The source code is available on GitHub.

The entire JavaScript application is less than 150 lines of well-commented JavaScript and uses the AWS SDK’s S3 client library to interact with Backblaze B2 via the Backblaze S3 Compatible API. The core of the application is quite straightforward:

    // Get the image from B2 (returns a readable stream as the body)
    console.log(`Fetching image from ${inputUrl}`);
    const obj = await client.getObject({
      Bucket: bucket,
      Key: keyBase + (extension ? "." + extension : "")
    });

    // Create a Sharp transformer into which we can stream image data
    const transformer = sharp()
      .rotate()                // Auto-orient based on the EXIF Orientation tag
      .resize(RESIZE_OPTIONS); // Resize according to configured options

    // Pipe the image data into the transformer
    obj.Body.pipe(transformer);

    // We can read the transformer output into a buffer, since we know 
    // that thumbnails are small enough to fit in memory
    const thumbnail = await transformer.toBuffer();

    // Remove any extension from the incoming key and append '_tn.'
    const outputKey = path.parse(keyBase).name + TN_SUFFIX 
                        + (extension ? "." + extension : "");
    const outputUrl = B2_ENDPOINT + '/' + bucket + '/' 
                        + encodeURIComponent(outputKey);

    // Write the thumbnail buffer to the same B2 bucket as the original
    console.log(`Writing thumbnail to ${outputUrl}`);
    await client.putObject({
      Bucket: bucket,
      Key: outputKey,
      Body: thumbnail,
      ContentType: 'image/jpeg'
    });

    // We're done - reply with the thumbnail's URL
    response.json({
      thumbnail: outputUrl
    });

One thing you might notice in the above code is that neither the image nor the thumbnail is written to disk. The getObject() API provides a readable stream; the app passes that stream to the Sharp transformer, which reads the image data from B2 and creates the thumbnail in memory. This approach is much faster than downloading the image to a local file, running an image-processing tool such as ImageMagick to create the thumbnail on disk, then uploading the thumbnail to Backblaze B2.

Deploying a Rising Cloud Web Service

With my app written and tested running locally on my laptop, it was time to deploy it to Rising Cloud. There are two types of Rising Cloud applications: Web Services and Tasks. A Rising Cloud Web Service directly accepts HTTP requests and returns HTTP responses synchronously, with the condition that it must return an HTTP response within 44 seconds to avoid a timeout—an easy fit for my thumbnail creator app. If I was transcoding video, on the other hand, an operation that might take several minutes, or even hours, a Rising Cloud Task would be more suitable. A Rising Cloud Task is a queueable function, implemented as a Linux executable, which may not require millisecond-level response times.

Rising Cloud uses Docker-style containers to deploy, scale, and manage apps, so the next step was to package my app as a Docker image to deploy as a Rising Cloud Web Service by creating a Dockerfile.

With that done, I was able to configure my app with its Backblaze B2 Application Key and Key ID, endpoint, and the required dimensions for the thumbnail. As with many other cloud platforms, apps can be configured via environment variables. Using the AWS SDK’s variable names for the app’s Backblaze B2 credentials meant that I didn’t have to explicitly handle them in my code—the SDK automatically uses the variables if they are set in the environment.

Rising Cloud Environment — Click to enlarge.

Notice also that the RESIZE_OPTIONS value is formatted as JSON, allowing maximum flexibility in configuring the resize operation. As you can see, I set the withoutEnlargement parameter as well as the desired width, so that images already smaller than the width would not be enlarged.

Calling a Rising Cloud Web Service

By default, Rising Cloud requires that app clients supply an API key with each request as an HTTP header with the name X-RisingCloud-Auth:

Rising Cloud Security — Click to enlarge.

So, to test the Web Service, I used the curl command-line tool to send a POST request containing a JSON payload in the format emitted by the Cloudflare Worker and the API key:

curl -d @example-request.json \
	-H 'Content-Type: application/json' \
	-H 'X-RisingCloud-Auth: ' \
	https://b2-risingcloud-demo.risingcloud.app/thumbnail

As expected, the Web Service responded with the URL of the newly created thumbnail:

{
  "thumbnail":"https://s3.us-west-001.backblazeb2.com/my-bucket/image001_tn.jpg"
}

(JSON formatted for clarity)

The final piece of the puzzle was to create a Cloudflare Worker from the Backblaze B2 Proxy template, and add a line of code to include the Rising Cloud API key HTTP header in its notification. The Cloudflare Worker configuration includes its Backblaze B2 credentials, Backblaze B2 endpoint, Rising Cloud API key, and the Web Service endpoint (webhook):

Environment Variables — Click to enlarge.

This short video shows the application in action, and how Rising Cloud spins up new instances to handle an influx of traffic:

Process Your Own B2 Files in Rising Cloud

You can deploy an application on Rising Cloud to respond to any Backblaze B2 operation(s). You might want to upload a standard set of files whenever a bucket is created, or keep an audit log of Backblaze B2 operations performed on a particular set of buckets. And, of course, you’re not limited to triggering your Rising Cloud application from a Cloudflare worker—your app can respond to any HTTP request to its endpoint.

Submit your details here to set up a free trial of Rising Cloud. If you’re not already building on Backblaze B2, sign up to create an account today—the first 10 GB of storage is free!

The post Go Serverless with Rising Cloud and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

“We Were Stoked”: Santa Cruz Skateboards on Cloud Storage That Just Works

2022-05-17 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/we-were-stoked-santa-cruz-skateboards-on-cloud-storage-that-just-works/

For a lot of us here at Backblaze, skateboarding culture permeated our most formative years. That’s why we were excited to hear from the folks at Santa Cruz Skateboards about how they use Backblaze B2 Cloud Storage to protect decades of skateboarding history. The company is the pinnacle of cool for millennials of a certain age, and, let’s face it, anyone not living under a rock since the mid-70s.

We got the chance to talk shop with Randall Vevea, Information Technology Specialist for Santa Cruz Skateboards, and he shared how they:

Implemented a cloud disaster recovery strategy to protect decades of data in a tsunami risk zone.
Created an automated production and VM backup solution using rclone.
Backed up more data affordably and efficiently in truly accessible storage.

Read on to learn how they did it.

Santa Cruz Skateboards: The Origin Story

It’s 1973 in sunny Santa Cruz, California. Three local guys—Richard Novak, Doug Haut, and Jay Shuirman—are selling raw fiberglass to the folks that make surfboards, boats, and race car parts. On a surf trip in Hawaii, the trio gets a request to throw together some skateboards. They make 500 and sell out immediately. Twice. Just like that, Santa Cruz Skateboards is born.

Fast forward to today, and Santa Cruz Skateboards is considered the backbone of skateboarding. For over five decades, the company has been putting out a steady stream of skateboards, apparel, accessories, and so much more, all emblazoned with the kinds of memorable art that have shaped skate culture.

Their video archives trace the evolution of skateboarding, following big name players, introducing rising stars, and documenting the events and competitions that connect the skate community all over the world, and it all needs to be protected, accessible, and organized.

A Little Storm Surge Can’t Stop Santa Cruz Skateboards

Randall estimates that the company stores about 40 terabytes of data just in art and media assets alone. Those files form an important historical archive, but the creative team is also constantly referencing and updating existing art—losing it isn’t an option. But potential data loss situations abound, particularly in the weather-prone area of Santa Cruz Harbor.

In January 2022, an underwater volcanic eruption off the coast of Tonga caused a tsunami that flooded Santa Cruz to the tune of $6 million in damage to the harbor. Businesses in the area are used to living with tsunami advisories (there was another scare just two years ago), but that doesn’t make dealing with the damage any easier. “The tsunami lit a fire under us to make sure that in the event that something were to go wrong here, we had our data somewhere else,” Randall said.

On top of weather threats, the pandemic forced Santa Cruz Skateboards to transition from a physical, on-premises setup to a more virtualized infrastructure that could support remote work. That transition was one of the main reasons Santa Cruz Skateboards started looking for a cloud data storage solution; it’s not just easier to back up that virtual machine data, but also to spin up those machines on a hypervisor in the event that something does go wrong.

Dropping in on a Major Bummer Called AWS Glacier

Before Randall joined Santa Cruz Skateboards, the company had been using AWS Glacier, a cold storage solution. “When I came on, Glacier was not in a working state,” Randall recalled. Data had been uploaded, but wasn’t syncing. “I’m not an AWS expert—I feel like you could go to school for four years and never learn all the inner workings of AWS. We needed a solution that we could implement quickly and without the hassle,” he said.

Glacier posed the problems above and beyond that heavy lift, including:

Changes to the AWS architecture made Santa Cruz Skateboards’ data inaccessible.
Requests to download data timed out due to cold storage delays.
Endless support emails failed to answer questions or give Randall access to the data trapped in AWS’ black box.

“We were in a situation where we were paying AWS for nothing, basically,” Randall remembered. “I started looking around for different solutions and everywhere I turned, Backblaze was the answer.” Assuming it would take a long time, Randall started small with an FTP server and a local file server. Within two days, all that data was fully backed up. Impressed with those results, he contacted Backblaze for a more thorough introduction. “We were super stoked on something that just worked. I was able to deliver that to our executives and say look, our data is in Backblaze now. We don’t have to worry about this anymore,” Randall said.

“I feel like you could go to school for four years and never learn all the inner workings of AWS. We were in a situation where we were paying AWS for nothing, basically.”
—Randall Vevea, Information Technology Specialist, Santa Cruz Skateboards

Backups Are Like Helmets—They Let You Do the Big Things Better

When a project that Randall had expected to take three or four months was completed in one, Randall started to ask, “What else can we put in Backblaze?” They ended up expanding their scope considerably, including:

Decades of art, image, and video files.
Mission critical business files.
Virtual machine backups.
OneDrive backups.

All told, that amounted to about 60TB of data all managed by a small IT team supporting about 150 employees company-wide. In order to return his valuable time and attention to critical everyday IT tasks—everything from fixing printers to preventing ransomware attacks—Randall needed to find a backup solution that could run reliably in the background without much manual input or upkeep, and Backblaze delivered.

Today, Santa Cruz Skateboards uses two network attached storage devices that clone each other and both back up to the cloud using rclone, an open-source command line program that people use to manage or migrate content. Rclone is also able to handle the company’s complex file names with characters in foreign scripts, like files with names written in Chinese, for example, which solved Randall’s worry about mismatched data as the creative team pulls down files to work with art and other visual assets. He set up a Linux box as a backup manager, which he uses to run rclone cronjobs weekly. By the time Randall shows up to work on Monday mornings, the sync is complete.

With Backups Out of the Way, Santa Cruz Lives to Shred Another Day

“I like the fact that I don’t have to think about backups on a day-to-day basis.”
—Randall Vevea, Information Technology Specialist, Santa Cruz Skateboards

Now, all Randall has to do is check the logs to make sure everything is working as it should. With the backup process automated, there’s a long list of projects that the IT team can devote their time to.

Since making the move to Backblaze B2, Santa Cruz Skateboards is spending less to back up more data. “We have a lot more data in Backblaze than we ever thought we would have in AWS,” Randall said. “As far as cost savings, I think we’re spending about the same amount to store more data that we can actually access.”

The company’s creative team relies on the art and media assets that are now stored and always available in Backblaze B2. Now it’s easy to find and download the specific files should they need to restore them. Meanwhile, the IT team is relieved not to have to navigate AWS’ giant dashboards and complex issues of hot and cold storage with the Glacier service.

Santa Cruz Skateboards had been feeling like a small fish in the huge AWS pond, using a product that amounted to a single cog in a complex machine. Instead of having to divert his attention to research every time questions arise, Randall feels confident that he can rely on Backblaze to get his questions answered right away. “Personally, it’s a big lift off my shoulders,” he said. “Our data’s safe and sound and is getting backed up regularly, and I’m happy with that. I think everybody else is pretty happy with that, too.”

Is disaster recovery on your to-do list? Learn about our backup and archive solutions to safeguard your data against threats like natural disasters and ransomware.

The post “We Were Stoked”: Santa Cruz Skateboards on Cloud Storage That Just Works appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Channel Partner Program Launch: Ease, Transparency, and Predictability for the Channel

2022-05-12 Elton Carneiro

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/channel-partner-program-launch-ease-transparency-and-predictability-for-the-channel/

Since the early days of Backblaze B2 Cloud Storage, the advocacy that resellers and distributors have carried out in support of our products has been super important for us. Today, we can start to more fully return the favor: We are excited to announce the launch of our Channel Partner program.

In this program, we commit to delivering greater ease, transparency, and predictability to our Channel Partners through a suite of tools, resources, incentives, and benefits which will roll out over the balance of 2022. We’ve included the details below.

Read on to learn the specifics, or reach out to our Partner team today to schedule a meeting.

“When Backblaze expressed interest in working with CloudBlue Marketplace, we were excited to bring them into the fold. Their ease-of-use and affordable price point make them a great offering to our existing resellers, especially those in the traditional IT, MSP, and media & entertainment space.”
—Jess Warrington, General Manager, North America at CloudBlue

The Program’s Mission

This new program is designed to offer a simple and streamlined way for Channel Partners to do business with Backblaze. In this program, we are committed to three principles:

Ease

We’ll work consistently to simplify the way partners can do business with Backblaze, from recruitment to onboarding, and engagement to deal close. Work can be hard enough, we want work with us to feel easy.

Transparency

Openness and honesty are central to Backblaze’s business, and they will be in our dealings with partners as well. As we evolve the program, we’ll share our experiences and thoughts early and often, and we’ll encourage feedback and keep our doors open to your thoughts to inform how we can continue to improve the Channel Partner experience.

Predictability

Maintaining predictable pricing and a scalable capacity model for our resellers and distributors is central to this effort. We’ll also increasingly bundle additional features to answer all your customers’ cloud needs.

The Program’s Value

Making these new investments in our Channel Partner program is all about opening up the value of B2 Cloud Storage to more businesses. To achieve that, our team will help you to engage more customers, help those customers to build their businesses and accelerate their growth, and ultimately increase your profits.

Engage

Backblaze will drive joint marketing activities, provide co-branded collateral, and establish market development funds to drive demand.

Build

Any technology that supports S3-compatible storage can be paired with B2 Cloud Storage, and we continue to expand our Alliance Partner ecosystem—this means you can sell the industry-leading solutions your customers prefer paired with Backblaze B2.

Accelerate

Our products are differentiated by their ease of adoption and use, meaning they’ll be easy to serve to your customers for any use case: backup, archive or any object storage use case, and more—growing your topline revenue.

The Details

To deliver on the mission this program is aligned around, and the value it aims to deliver, our team has developed a collection of benefits, rewards, and resources. Many of these are available today, and some will come later this year (which we’ll clarify below). Importantly, we want to emphasize that this is just the beginning, and we will work to add to each of these lists over the coming months and years.

Benefits:

Deal registration.
Channel-exclusive product: Backblaze B2 Reserve.
Logo promotion on www.backblaze.com.
Joint marketing activities.

Rewards:

Rebates.
Seller incentives.
Market development funds (coming soon).

Resources:

Partner sales manager to help with onboarding, engagement, and deal close.
Partner marketing manager to help with joint messaging, go-to-market, and collateral.
A password-protected partner portal (coming soon).
Automation of deal registration, lead passing, and seller incentive payments.

Join Us!

We can’t wait to join with our current and future Channel Partners to deliver tomorrow’s solutions to any customer who can use astonishingly easy cloud storage! (We think that’s pretty much everybody.)

If you’re a reseller or distributor, we’d love to hear from you. If you’re a customer interested in benefiting from any of the above, we’d love to connect you with the right Channel Partner team to serve your needs. Either way, the doors are open and we look forward to helping out.

The post Channel Partner Program Launch: Ease, Transparency, and Predictability for the Channel appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcing Developer Quick Starts: Open-source Code You Can Build On

2022-05-11 Greg Hamer

Post Syndicated from Greg Hamer original https://www.backblaze.com/blog/announcing-developer-quick-starts-access-open-source-code-you-can-build-on/

Developing finished applications always requires coding custom functionality, but, as a developer, isn’t it great when you have pre-built, working code you can use as scaffolding for your applications? That way, you can get right to the custom components.

To help you finish building applications faster, we are launching our Developer Quick Start series. This series provides developers with free, open-source code available for download from GitHub. We also built pre-staged buckets with a browsable media application and sample data. For read-only API calls against those buckets, we are sharing API key pairs for programmatic access to these pre-staged buckets. That means you can download the code, run it, and see the results, all without even having to create a Backblaze account!

Today, we’re debuting the first Quick Start in the series—using Python with the Backblaze S3 Compatible API. Read on to get access to all of the resources, including the code on GitHub, sample data to run it against, a video walkthrough, and guided instructions.

Announcing Our Developer Quick Start for Using Python With the Backblaze S3 Compatible API

All of the resources you need to use Python with the Backblaze S3 Compatible API are linked below:

Sample Application: Get our open-source code on GitHub here.
Hosted Sample Data: Experiment with a media application with Application Keys shared for read-only access here.
Video Code Walk-throughs of Sample Application: Share and rewatch walk-throughs on demand here.
Guided Instructions: Get instructions that guide you through downloading the sample code, running it yourself, and then using the code as you see fit, including incorporating it into your own applications here.

Depending on your skill level, the open-source code may be all that you need. If you’re new to the cloud, or just want a deeper, guided walk-through on the source code, check out the written code walk-throughs and video-guided code walk-throughs, too. Whatever works best for you, please feel free to mix and match as you see fit.

The Quick Start walks you through how to perform create and delete API operations inside your own account, all of which can be completed using Backblaze B2 Cloud Storage—and the first 10GB of storage per month are on us.

With the Quick Start code we are sharing, you can get basic functionality working and interacting with B2 Cloud Storage in minutes.

Share the Love

Know someone who might be interested in leveraging the power and ease of cloud storage? Feel free to share these resources at will. Also, we welcome your participation in the projects on GitHub via pull requests. If you are satisfied, feel free to star the project on GitHub or like the videos on YouTube.

Finally, please explore our other Backblaze B2 Sample Code Repositories up on GitHub.

Stay Tuned for More

The initial launch of the Developer Quick Start series logic is available in Python. We will be rolling out Developer Quick Starts for other languages in the months ahead.

Which programming languages (or scripting environments) are of most interest for you? Please let us know in the comments down below. We are continually adding more working examples in GitHub projects, both in Python and in additional languages. Your feedback in the comments below can help guide what gets priority.

We look forward to hearing from you about how these Developer Quick Starts work for you!

The post Announcing Developer Quick Starts: Open-source Code You Can Build On appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Data Protection x2: Explore What Cloud Replication Can Do

2022-05-10 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/data-protection-x2-explore-what-cloud-replication-can-do/

Anyone overwhelmed by their to-do list wishes they could be in two places at once. Backblaze’s newest feature—currently in beta—might not be able to grant that wish, but it will soon offer something similarly useful: The new Cloud Replication feature means data can be in two places at once, solving a whole suite of issues that keep IT teams up at night.

The Background: What Is Backblaze Cloud Replication?

Cloud Replication will enable Backblaze customers to store files in multiple regions, or create multiple copies of files in one region, across the Backblaze Storage Cloud. Simply set replication rules via web UI or API on a bucket. Once the rules are set, any data uploaded to that bucket will automatically be replicated into a destination bucket either in the same region or another region. If it sounds easy, that’s because it is—even the English majors in our Marketing department have mastered this one.

The Why: What Can Cloud Replication Do for You?

There are three key use cases for Cloud Replication:

Protecting data for security, compliance, and continuity purposes.
Bringing data closer to distant teams or customers for faster access.
Providing version protection for testing and staging in deployment environments.

Redundancy for Compliance and Continuity

This is the top use case for cloud replication, and will likely have value for almost any enterprise with advanced backup strategies.

Whether you are concerned about natural disasters, political instability, or complying with possible government, industry, or board regulations—replicating data to another geographic region can check a lot of boxes easily and efficiently. Especially as enterprises move completely into the cloud, data redundancy will increasingly be a requirement for:

Modern business continuity and disaster recovery plans.
Industry and board compliance efforts centered on concentration risk issues.
Data residency requirements stemming from regulations like GDPR.

The gold standard for backup strategies has long been a 3-2-1 approach. The core principles of 3-2-1, originally developed for an on-premises world, still hold true, and today they are being applied in even more robust ways to an increasingly cloud-based world. Cloud replication is a natural evolution for organizations that are storing much more or even all of their data in the cloud or plan to in the future. It enables you to implement the core principles of 3-2-1, including redundancy and geographic separation, all in the cloud.

Data Proximity

If you have teams, customers, or workflows spread around the world, bringing a copy of your data closer to where work gets done can minimize speed-of-light limitations. Especially for media-heavy teams in game development and postproduction, seconds can make the difference in keeping creative teams operating smoothly. And because you can automate replication and use metadata to track accuracy and process, you can remove some manual steps from the process where errors and data loss tend to crop up.

Testing and Staging

The Status: When Can I Get My Hands on Cloud Replication?

Cloud Replication kicked off in beta in early April and our team and early testers have been breaking in the feature since then.

Here’s how things are lined up:

April 18: Phase One (Underway)
Phase one is a limited release that is currently underway. We’ve only unlocked new file replication in this release—meaning testers have to upload new data to test functionality.
May 24 (Projected): Phase Two
We’ll be unlocking the “existing file” Cloud Replication functionality at this time. This means users will be able to set up replication rules on existing buckets to see how replication will work for their business data.
Early June (Projected): General Availability
We’ll open the gates completely on June 7 with full functionality, yeehaw!

Want to Learn More About Cloud Replication?

Stay in the know about Cloud Replication availability—click here to get notified first.

If you want to dig into how this feature works via the CLI and API and learn about some of the edge cases, special circumstances, billing implications, and lookouts—our draft Cloud Replication documentation can be accessed here. We also have some help articles walking through how to create rules via the web application here.

Otherwise, we look forward to sharing more when this feature is fully baked and ready for consumption.

The post Data Protection x2: Explore What Cloud Replication Can Do appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Developers: Spring Into Action With Backblaze B2 Cloud Storage

2022-05-06

Post Syndicated from original https://www.backblaze.com/blog/developers-spring-into-action-with-backblaze-b2-cloud-storage/

Spring is in the air here in the Northern Hemisphere, and a developer’s fancy lightly turns to new projects. Whether you’ve already discovered how astonishingly easy it is to work with Backblaze B2 Cloud Storage or not, we hope you find this collection of handy tips, tricks, and resources useful—many of the techniques apply no matter where you are storing data. But first, let’s have a little fun…

Backblaze Developer Meetup

Whether you call yourself a developer, software engineer, or programmer, if you are a Backblaze B2 customer or are just Backblaze B2-curious and want to hang out in person with like-minded folks, here’s your chance. Backblaze is hosting its very first developer meetup on May 24th from 6–8 p.m. in downtown San Mateo, California. We’ll be joined by Gleb Budman, CEO and Co-founder of Backblaze, members of our Engineering team, our Developer Evangelism team, sales engineers, product managers, and more. There’ll be snacks, drinks, prizes, and more. Space is limited, so please sign up for a spot using this Google Form by May 13th and we’ll let you know if there’s space.

Join Us at GlueCon 2022

Are you going to GlueCon 2022? Backblaze will be there! GlueCon is a developer-centric event that will be held in Broomfield, Colorado on May 18th and 19th, 2022. Backblaze is the partner sponsor of the event and Pat Patterson, our chief technical evangelist, will deliver one of the keynotes. There’s still time to learn more and sign up for GlueCon 2022, but act now!

Tips and Tricks

Here’s a collection of tips and tricks we’ve published over the last few months. You can take them as written or use your imagination as to what other problems you can solve.

Media Transcoding With Backblaze B2 and Vultr Cloud Compute
Your task is simple: allow users to upload video from their mobile or desktop device and then make that video available to a wide variety of devices anywhere in the world. We walk you through how we built a very simple video sharing site with Backblaze B2 and Vultr’s Infrastructure Cloud using Vultr’s Cloud Compute instances for the application servers and their new Optimized Cloud Compute instances for the transcoding workers. This includes setup instructions for Vultr and sample code in GitHub.
Free Image Hosting With Cloudflare and Backblaze B2
Discover how the combination of Cloudflare and Backblaze B2 allows you to create your own, personal 10GB image hosting site for free. You start out using Cloudflare Transform Rules to give you access to HTTP traffic at the CDN edge server. This allows you to manipulate the URI path, query string, and HTTP headers of incoming requests and outgoing responses. We provide step-by-step instructions on how to setup both Cloudflare and Backblaze B2 and leave the rest up to you.
Building a Multiregion Origin Store With Backblaze B2 and Fastly Compute@Edge
Compute@Edge is a serverless computing environment built on the same caching platform as the Fastly Deliver@Edge CDN. Serverless computing removes provisioning, configuration, maintenance, and scaling from the equation. One place where this technology can be used is in serving your own data from multiple Backblaze B2 regions—in other words, serve it from the closest or most available location. Learn how to create a Compute@Edge application and connect it to Backblaze B2 buckets making your data available anywhere.
Using a Cloudflare Worker to Send Notifications on Backblaze B2 Events
When building an application, a common requirement is to be able to send a notification of an event (e.g., a user uploading a file) so that an application can take some action (e.g., processing the file). Learn how you can use a Cloudflare Worker to send event notifications to a wide range of recipients, allowing great flexibility when building integrations with Backblaze B2.

Additional Resources

Our Postman Collection for the Backblaze S3 Compatible API: Check out our documentation and interact with Backblaze B2 in Postman.
The Backblaze B2 Universal Data Migration Service: Migrate your data from virtually anywhere to Backblaze B2, on us.
Backblaze S3 Compatible API Documentation: If your application is already built to interact with S3 logic, you can use Backblaze B2 with minimal or no code changes.
Backblaze Developer Solutions Overview: Learn more about how simple cloud object storage can help you build and scale applications.
Backblaze B2 Partner Integrations: Bring your own tools—check out our integrations to see what more you can do with your data.

What’s Next?

Coming soon on our blog, we’ll provide a developer quick start kit using Python that you can use with the Backblaze S3 Compatible API to store and access data in B2 Cloud Storage. The quick start kit includes:

A sample application with open-source code on GitHub.
Video code walk-throughs of the sample application.
Hosted sample data.
Guided instructions that walk you through downloading the sample code, running it yourself, and then using the code as you see fit, including incorporating it into your own applications.

Launching in mid-May; stay tuned!

Wrap-up

Hopefully you’ve found a couple of things you can try out using Backblaze B2 Cloud Storage. Join the many developers around the world who have discovered how easy it can be to work with Backblaze B2. If you have any questions, you can visit www.backblaze.com/help.html to use our Knowledge Base, chat with our customer support, or submit a customer support request. Of course, you’ll find lots of other developers online who are more than willing to help as well. Good luck and invent something awesome.

The post Developers: Spring Into Action With Backblaze B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Drive Stats for Q1 2022

2022-05-04

Post Syndicated from original https://www.backblaze.com/blog/backblaze-drive-stats-for-q1-2022/

A long time ago, in a galaxy far, far away, Backblaze began collecting and storing statistics about the hard drives it uses to store customer data. As of the end of Q1 2022, Backblaze was monitoring 211,732 hard drives and SSDs in our data centers around the universe. Of that number, there were 3,860 boot drives, leaving us with 207,872 data drives under management. This report will focus on those data drives. We will review the hard drive failure rates for those drive models that were active as of the end of Q1 2022, and we’ll also look at their lifetime failure statistics. In between, we will dive into the failure rates of the active drive models over time. Along the way, we will share our observations and insights on the data presented and, as always, we look forward to you doing the same in the comments section at the end of the report.

“The greatest teacher, failure is.”¹

As of the end of Q1 2022, Backblaze was monitoring 207,872 hard drives used to store data. For our evaluation, we removed 394 drives from consideration as they were either used for testing purposes or were drive models which did not have at least 60 active drives. This leaves us with 207,478 hard drives to analyze for this report. The chart below contains the results of our analysis for Q1 2022.

“Always pass on what you have learned.”²

In reviewing the Q1 2022 table above and the data that lies underneath, we offer a few observations and caveats:

“The Force is strong with this one.”³ The 6TB Seagate (model: ST6000DX000) continues to defy time with zero failures during Q1 2022 despite an average age of nearly seven years (83.7 months). 98% of the drives (859) were installed within the same two-week period back in Q1 2015. The youngest 6TB drive in the entire cohort is a little over four years old. The 4TB Toshiba (model: MD04ABA400V) also had zero failures during Q1 2022 and the average age (82.3 months) is nearly as old as the Seagate drives, but the Toshiba cohort has only 97 drives. Still, they’ve averaged just one drive failure per year over their Backblaze lifetime.

“Great, kid, don’t get cocky.”⁴ There were a number of padawan drives (in average age) that also had zero drive failures in Q1 2022. The two 16TB WDC drives (models: WUH721816ALEL0 and WUH721816ALEL4) lead the youth movement with an average age of 5.9 and 1.5 months respectively. Between the two models, there are 3,899 operational drives and only one failure since they were installed six months ago. A good start, but surely not Jedi territory yet.

“I find your lack of faith disturbing.”⁵ You might have noticed the AFR for Q1 2022 of 24.31% for the 8TB HGST drives (model: HUH728080ALE604). The drives are young with an average age of two months, and there are only 76 drives with a total of 4,504 drive days. If you find the AFR bothersome, I do in fact find your lack of faith disturbing, given the history of stellar performance in the other HGST drives we employ. Let’s see where we are in a couple of quarters.

“Try not. Do or do not. There is no try.”⁶ The saga continues for the 14TB Seagate drives (model: ST14000NM0138). When we last saw this drive, the Seagate/Dell/Backblaze alliance continued to work diligently to understand why the failure rate was stubbornly high. Unusual it is for this model, and the team has employed multiple firmware tweaks over the past several months with varying degrees of success. Patience.

“I like firsts. Good or bad, they’re always memorable.”⁷

We have been delivering quarterly and annual Drive Stats reports since Q1 2015. Along the way, we have presented multiple different views of the data to help provide insights into our operational environment and the hard drives in that environment. Today we’d like to offer a different way to visualize comparing the average age of many of the different models we currently use versus the annualized failure rate of each of those drive models: the Drive Stats Failure Square:

“…many of the truths that we cling to depend on our viewpoint.”⁸

Each point on the Drive Stats Failure Square represents a hard drive model in operation in our environment as of 3/31/2022 and lies at the intersection of the average age of that model and the annualized failure rate of that model. We only included drive models with a lifetime total of one million drive days or with a confidence interval of all drive models included being 0.6 or less.

The resulting chart is divided into four equal quadrants, which we will categorize as follows:

Quadrant I: Retirees. Drives in this quadrant have performed well, but given their current high AFR level they are first in line to be replaced.
Quadrant II: Winners. Drives in this quadrant have proven themselves to be reliable over time. Given their age, we need to begin planning for their replacement, but there is no need to panic.
Quadrant III: Challengers. Drives in this quadrant have started off on the right foot and don’t present any current concerns for replacement. We will continue to monitor these drive models to ensure they stay on the path to the winners quadrant instead of sliding off to quadrant IV.
Quadrant IV: Muddlers. Drives in this quadrant should be replaced if possible, but they can continue to operate if their failure rates remain at their current rate. The redundancy and durability built into the Backblaze platform protects data from the higher failure rates of the drives in this quadrant. Still, these drives are a drain on data center and operational resources.

“Difficult to see; always in motion is the future.”⁹

Obviously, the Winners quadrant is the desired outcome for all of the drive models we employ. But every drive basically starts out in either quadrant III or IV and moves from there over time. The chart below shows how the drive models in quadrant II (Winners) got there.

“Your focus determines your reality.”¹⁰

Each drive model is represented by a snake-like line (Snakes on a plane!?) which shows the AFR of the drive model as the average age of the fleet increased over time. Interestingly, each of the six models currently in quadrant II has a different backstory. For example, who could have predicted that the 6TB Seagate drive (model: ST6000DX000) would have ended up in the Winners quadrant given its less than auspicious start in 2015. And that drive was not alone; the 8TB Seagate drives (models: ST8000NM0055 and ST8000DM002) experienced the same behavior.

This chart can also give us a visual clue as to the direction of the annualized failure rate over time for a given drive model. For example, the 10TB Seagate drive seems more interested in moving into the Retiree quadrant over the next quarter or so and as such its replacement priority could be increased.

“In my experience, there’s no such thing as luck.”¹¹

In the quarterly Drive Stats table at the start of this report, there is some element of randomness which can affect the results. For example, whether a drive is reported as a failure on the 31st of March at 11:59 p.m. or at 12:01 a.m. on April 1st can have a small effect on the results. Still, the quarterly results are useful in surfacing unexpected failure rate patterns, but the most accurate information regarding a given drive model is captured in the lifetime annualized failures rates.

The chart below shows the lifetime annualized failure rates of all the drive models in production as of March 31, 2022.

“You have failed me for the last time…”¹²

The lifetime annualized failure rate for all the drives listed above is 1.39%. That was down from 1.40% at the end of 2021. One year ago (3/31/2021), the lifetime AFR was 1.49%.

When looking at the lifetime failure table above, any drive models with less than 500,000 drive days or a confidence interval greater than 1.0% do not have enough data to be considered an accurate portrayal of their performance in our environment. The 8TB HGST drives (model: HUH728080ALE604) and the 16TB Toshiba drives (model: MG08ACA16TA) are good examples of such drives. We list these drives for completeness as they are also listed in the quarterly table at the beginning of this review.

Given the criteria above regarding drive days and confidence intervals, the best performing drive in our environment for each manufacturer is:

HGST: 12TB, model: HUH721212ALE600. AFR: 0.33%
Seagate: 12TB model: ST12000NM001G. AFR 0.63%
WDC: 14TB model: WUH721414ALE6L4. AFR: 0.33%
Toshiba: 16TB model: MG08ACA16TEY. AFR 0.70%

“I never ask that question until after I’ve done it!”¹³

For those of you interested in how we produce this report, the data we used is available on our Hard Drive Test Data webpage. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell the data itself to anyone; it is free.

Good luck and let us know if you find anything interesting. And no, it’s not a trap.

Quotes Referenced

“The greatest teacher, failure is.”—Yoda, “The Last Jedi”
“Always pass on what you have learned.”—Yoda, “Return of the Jedi”
“The Force is strong with this one.”—Darth Vader, “A New Hope”
“Great, kid, don’t get cocky.”—Han Solo, “A New Hope”
“I find your lack of faith disturbing.”—Darth Vader, “A New Hope”
“Try not. Do or do not. There is no try.”—Yoda, “The Empire Strikes Back”
“I like firsts. Good or bad, they’re always memorable.”—Ahsoka Tano, “The Mandalorian”
“…many of the truths that we cling to depend on our viewpoint.”—Obi-Wan Kenobi, “Return of the Jedi”
“Difficult to see; always in motion is the future.”—Yoda, “The Empire Strikes Back”
“Your focus determines your reality.”—Qui-Gon Jinn, “The Phantom Menace”
“In my experience, there’s no such thing as luck.”—Obi-Wan Kenobi, “A New Hope”
“You have failed me for the last time…”—Darth Vader, “The Empire Strikes Back”
“I never ask that question until after I’ve done it!”—Han Solo, “The Force Awakens”

The post Backblaze Drive Stats for Q1 2022 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Use a Cloudflare Worker to Send Notifications on Backblaze B2 Events

2022-04-27 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/use-a-cloudflare-worker-to-send-notifications-on-backblaze-b2-events/

When building an application or solution on Backblaze B2 Cloud Storage, a common requirement is to be able to send a notification of an event (e.g., a user uploading a file) so that an application can take some action (e.g., processing the file). In this blog post, I’ll explain how you can use a Cloudflare Worker to send event notifications to a wide range of recipients, allowing great flexibility when building integrations with Backblaze B2.

Why Use a Proxy to Send Event Notifications?

Event notifications are useful whenever you need to ensure that a given event triggers a particular action. For example, last month, I explained how a video sharing site running on Vultr’s Infrastructure Cloud could store raw and transcoded videos in Backblaze B2. In that example, when a user uploaded a video to a Backblaze B2 bucket via the web application, the web app sent a notification to a Worker app instructing the Worker to read the raw video file from the bucket, transcode it, and upload the processed file back to Backblaze B2.

A drawback of this approach is that, if we were to create a mobile app to upload videos, we would have to copy the notification logic into the mobile app. As the system grows, so does the maintenance burden. Each new app needs code to send notifications and, worse, if we need to add a new field to the notification message, we have to update all of the apps. If, instead, we move the notification logic from the web application to a Cloudflare Worker, we can send notifications on Backblaze B2 events from a single location, regardless of the origin of the request. This pattern of wrapping an API with a component that presents the exact same API but adds its own functionality is known as a proxy.

Cloudflare Workers: A Brief Introduction

Cloudflare Workers provides a serverless execution environment that allows you to create applications that run on Cloudflare’s global edge network. A Cloudflare Worker application intercepts all HTTP requests destined for a given domain, and can return any valid HTTP response. Your Worker can create that HTTP response in any way you choose. Workers can consume a range of APIs, allowing them to directly interact with the Cloudflare cache, manipulate globally unique Durable Objects, perform cryptographic operations, and more.

Cloudflare Workers often, but not always, implement the proxy pattern, sending outgoing HTTP requests to servers on the public internet in the course of servicing incoming requests. If we implement a proxy that intercepts requests from clients to Backblaze B2, it could both forward those requests to Backblaze B2 and send notifications of those requests to one or more recipient applications.

This example focuses on proxying requests to the Backblaze S3 Compatible API, and can be used with any S3 client application that works with Backblaze B2 by simply changing the client’s endpoint configuration.

Implementing a similar proxy for the B2 Native API is much simpler, since B2 Native API requests are secured by a bearer token rather than a signature. A B2 Native API proxy would simply copy the incoming request, including the bearer token, changing only the target URL. Look out for a future blog post featuring a B2 Native API proxy.

Proxying Backblaze B2 Operations With a Cloudflare Worker

S3 clients send HTTP requests to the Backblaze S3 Compatible API over a TLS-secured connection. Each request includes the client’s Backblaze Application Key ID (access key ID in AWS parlance) and is signed with its Application Key (secret access key), allowing Backblaze B2 to authenticate the client and verify the integrity of the request. The signature algorithm, AWS Signature Version 4 (SigV4), includes the Host header in the signed data, ensuring that a request intended for one recipient cannot be redirected to another. Unfortunately, this is exactly what we want to happen in this use case!

Our proxy Worker must therefore validate the signature on the incoming request from the client, and then create a new signature that it can include in the outgoing request to the Backblaze B2 endpoint. Note that the Worker must be configured with the same Application Key and ID as the client to be able to validate and create signatures on the client’s behalf.

Here’s the message flow:

A user performs an action in a Backblaze B2 client application, for example, uploading an image.
The client app creates a signed request, exactly as it would for Backblaze B2, but sends it to the Cloudflare Worker rather than directly to Backblaze B2.
The Worker validates the client’s signature, and creates its own signed request.
The Worker sends the signed request to Backblaze B2.
Backblaze B2 validates the signature, and processes the request.
Backblaze B2 returns the response to the Worker.
The Worker forwards the response to the client app.
The Worker sends a notification to the webhook recipient.
The recipient takes some action based on the notification.

These steps are illustrated in the diagram below.

The validation and signing process imposes minimal overhead, even for requests with large payloads, since the signed data includes a SHA-256 digest of the request payload, included with the request in the x-amz-content-sha256 HTTP header, rather than the payload itself. The Worker need not even read the incoming request payload into memory, instead passing it to the Cloudflare Fetch API to be streamed directly to the Backblaze B2 endpoint.

The Worker returns Backblaze B2’s response to the client unchanged, and creates a JSON-formatted webhook notification containing the following parameters:

contentLength: Size of the request body, if there was one, in bytes.
contentType: Describes the request body, if there was one. For example, image/jpeg.
method: HTTP method, for example, PUT.
signatureTimestamp: Request timestamp included in the signature.
status: HTTP status code returned from B2 Cloud Storage, for example 200 for a successful request or 404 for file not found.
url: The URL requested from B2 Cloud Storage, for example, https://s3.us-west-004.backblazeb2.com/my-bucket/hello.txt.

The Worker submits the notification to Cloudflare for asynchronous processing, so that the response to the client is not delayed. Once the interaction with the client is complete, Cloudflare POSTs the notification to the webhook recipient.

Prerequisites

If you’d like to follow the steps below to experiment with the proxy yourself, you will need to:

Sign up for a Backblaze B2 account. You’ll receive 10GB of storage, free of charge, no credit card required.
Sign up for a CloudflareWorkers account. You’ll be able to publish Workers to the default *.workers.dev subdomain free of charge, or to your own paid domain.
Install and configure the Workers CLI, wrangler.

1. Creating a Cloudflare Worker Based on the Proxy Code

The Cloudflare Worker B2 Webhook GitHub repository contains full source code and configuration details. You can use the repository as a template for your own Worker using Cloudflare’s wrangler CLI. You can change the Worker name (my-proxy in the sample code below) as you see fit:

wrangler generate my-proxy https://github.com/backblaze-b2-samples/cloudflare-b2-proxy cd my-proxy

2. Configuring and Deploying the Cloudflare Worker

You must configure AWS_ACCESS_KEY_ID and AWS_S3_ENDPOINT in wrangler.toml before you can deploy the Worker. Configuring WEBHOOK_URL is optional—you can set it to empty quotes if you just want a vanity URL for Backblaze B2.

[vars]

AWS_ACCESS_KEY_ID = "<your b2 application key id>" AWS_S3_ENDPOINT = "</your><your endpoint - e.g. s3.us-west-001.backblazeb2.com>" AWS_SECRET_ACCESS_KEY = "Remove this line after you make AWS_SECRET_ACCESS_KEY a secret in the UI!" WEBHOOK_URL = "<e.g. https://api.example.com/webhook/1 >"

Note the placeholder for AWS_SECRET_ACCESS_KEY in wrangler.toml. All variables used in the Worker must be set before the Worker can be published, but you should not save your Backblaze B2 application key to the file (see the note below). We work around these constraints by initializing AWS_SECRET_ACCESS_KEY with a placeholder value.

Use the CLI to publish the Worker project to the Cloudflare Workers environment:

wrangler publish

Now log in to the Cloudflare dashboard, navigate to your new Worker, and click the Settings tab, Variables, then Edit Variables. Remove the placeholder text, and paste your Backblaze B2 Application Key as the value for AWS_SECRET_ACCESS_KEY. Click the Encrypt button, then Save. The environment variables should look similar to this:

Finally, you must remove the placeholder line from wrangler.toml. If you do not do so, then the next time you publish the Worker, the placeholder value will overwrite your Application Key.

Why Not Just Set AWS_SECRET_ACCESS_KEY in wrangler.toml?

You should never, ever save secrets such as API keys and passwords in source code files. It’s too easy to forget to remove sensitive data from source code before sharing it either privately or, worse, on a public repository such as GitHub.

You can access the Worker via its default endpoint, which will have the form https://my-proxy.<your-workers-subdomain>.workers.dev, or create a DNS record in your own domain and configure a route associating the custom URL with the Worker.

If you try accessing the Worker URL via the browser, you’ll see an error message:

<Error> <Code>AccessDenied</Code> <Message> Unauthenticated requests are not allowed for this api </Message> </Error>

This is expected—the Worker received the request, but the request did not contain a signature.

3. Configuring the Client Application

The only change required in your client application is the S3 endpoint configuration. Set it to your Cloudflare Worker’s endpoint rather than your Backblaze account’s S3 endpoint. As mentioned above, the client continues to use the same Application Key and ID as it did when directly accessing the Backblaze S3 Compatible API.

4. Implementing a Webhook Consumer

The webhook consumer must accept JSON-formatted messages via HTTP POSTs at a public endpoint accessible from the Cloudflare Workers environment. The webhook notification looks like this:

{ "contentLength": 30155, "contentType": "image/png", "method": "PUT", "signatureTimestamp": "20220224T193204Z", "status": 200, "url": "https://s3.us-west-001.backblazeb2.com/my-bucket/image001.png" }

You might implement the webhook consumer in your own application or, alternatively, use an integration platform such as IFTTT, Zapier, or Pipedream to trigger actions in downstream systems. I used Pipedream to create a workflow that logs each Backblaze B2 event as a new row in a Google Sheet. Watch it in action in this short video:

Put the Proxy to Work!

The Cloudflare Worker/Backblaze B2 Proxy can be used as-is in a wide variety of integrations—anywhere you need an event in Backblaze B2 to trigger an action elsewhere. At the same time, it can be readily adapted for different requirements. Here are a few ideas.

In this initial implementation, the client uses the same credentials to access the Worker as the Worker uses to access Backblaze B2. It would be straightforward to use different credentials for the upstream and downstream connections, ensuring that clients can’t bypass the Worker and access Backblaze B2 directly.

POSTing JSON data to a webhook endpoint is just one of many possibilities for sending notifications. You can integrate the worker with any system accessible from the Cloudflare Workers environment via HTTP. For example, you could use a stream-processing platform such as Apache Kafka to publish messages reliably to any number of consumers, or, similarly, send a message to an Amazon Simple Notification Service (SNS) topic for distribution to SNS subscribers.

As a final example, the proxy has full access to the request and response payloads. Rather than sending a notification to a separate system, the worker can operate directly on the data, for example, transparently compressing incoming uploads and decompressing downloads. The possibilities are endless.

How will you put the Cloudflare Worker Backblaze B2 Proxy to work? Sign up for a Backblaze B2 account and get started!

The post Use a Cloudflare Worker to Send Notifications on Backblaze B2 Events appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

A Refreshing Partnership: Backblaze and SoDA

2022-04-22 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/a-refreshing-partnership-backblaze-and-soda/

Editor’s Note: SoDA and Backblaze will be at NAB 2022 and would love to tell you more about our joint solution—offering data analysis and movement, FREE during initial migration—at NAB. Set up a meeting here.

Moving all your stuff is one of the most paralyzing projects imaginable. Which is why professional movers are amazing: one tackles the dishes, a couple folks hit the mattresses and closets. And there’s one guy (probably the rookie) who gets assigned to the junk drawers and odd gadgets. Suddenly, your old house is empty and your life is safe and orderly in boxes moving across the country.

Now imagine moving your businesses’ most valuable data across the country or the world—whether it’s the organization, the security, the budgeting, or all of the above and then some—it can be absolutely paralyzing, even when your current data storage approach is holding you back and you know you need to make a change.

This is where SoDA comes in.

Essentially your professional movers in the cloud, the SoDA team analyzes your cloud or on-prem infrastructure and then orchestrates the movement, replication, or syncing of data to wherever you want it to go—limiting any downtime in the process and ensuring your data is secure in flight and structured exactly as you need it in its new home. If deciding where to send data is an issue, they’ll use the analysis of your existing setup to scope the best solution by value for your business.

The Backblaze and SoDA Partnership leverages SoDA’s data movement services to unlock Backblaze B2 Cloud Storage’s value for more businesses. The partnership offers the following benefits:

A cost analysis of your existing storage infrastructure.
A “dry run” feature that compares existing storage costs to new storage costs and any transfer costs so you “know before you go.”
The ability to define policies for how the data should move and where.
Flexibility to move, copy, sync, or archive data to Backblaze B2.
Migration and management via the Backblaze S3 Compatible API—easily migrate data, and then develop and manage both on-prem and cloud data via the API going forward.

Why Should You Try Backblaze and SoDA?

First: Backblaze will pay for SoDA’s services for any customer who agrees to migrate 10TB or more and commit to maintaining at least 10TB in Backblaze B2 for a minimum of one year.*

People don’t believe this when we tell them, but we’ll say it again: You won’t receive an invoice for your initial data migration, ever.

If that’s not reason enough to run a proof of concept, here’s more to think about:

Moving a couple of files to the cloud is easy peasy. But what happens if you have billions of files structured in multiple folders across multiple storage locations? You could use legacy tools or command line tools, but all of the scripting and resource management for the data in flight will be on you. You’re smart enough to do it, but if someone else is willing to pay the metaphorical movers, why deal with the hassle?

With SoDA, you do not have to worry about any of it. You define your source locations, define your destination Backblaze B2 bucket and start a transfer. SoDA takes care of the rest. That is truly easy peasy.

An Example of the Backlaze and SoDA Value Proposition

One customer we recently worked with was managing data in their own data center and having issues with reliability and SLAs for current customers. They needed availability at 99.999% as well as cost-effectiveness for future scaling. They identified Backblaze B2 as a provider that checked both boxes and Backblaze recommended SoDA for the move. The customer migrated 1PB of data (over a billion files) into B2 Cloud Storage. Other than making the decision and pointing where the data should go, the customer didn’t have to lift a finger.

Try It Today

If you’re not convinced yet, the SoDA and Backblaze teams are ready to make your life easier at any time. You can schedule a meeting here. Or you can check out the Quickstart guide to explore the solution today.

*Conditions may apply.

The post A Refreshing Partnership: Backblaze and SoDA appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Ransomware Takeaways From Q1 2022

2022-04-19 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/ransomware-takeaways-from-q1-2022/

The impact of the war in Ukraine is evolving in real time, particularly when it comes to the ransomware landscape. Needless to say, it dominated the ransomware conversation throughout Q1 2022. This quarter, we’re digging into some of the consequences from the invasion and what it means for you in addition to a few broader domestic developments.

Why? Staying up to date on ransomware trends can help you prepare your security infrastructure in the short and long term to protect your valuable data. In this series, we share five key takeaways based on what we saw over the previous quarter. Here’s what we observed in Q1 2022.

This post is a part of our ongoing series on ransomware. Take a look at our other posts for more information on how businesses can defend themselves against a ransomware attack, and more.

“Ransomware: How to Prevent or Recover From an Attack”
“Introducing the Ransomware Economy”
“Object Lock 101: Protecting Data From Ransomware”
“The True Cost of Ransomware”
2021 Ransomware Takeaways: Q1, Q2, Q3, Q4

1. Sanctions and International Attention May Have Depressed Some Ransomware Activity

Following the ground invasion, ransomware attacks seemed to go eerily quiet especially when government officials predicted cyberattacks could be a key tactic. That’s not to say attacks weren’t being carried out without being reported, but the radio silence was notable enough that a few media outlets wondered why.

International attention may be one reason—cybercriminals tend to be wary of the spotlight. Having the world’s eyes on a region where much cybercrime originates seems to have pushed cybercriminals into the shadows. The sanctions imposed on Russia have made it more difficult for cybercrime syndicates based in the country to receive, convert, and disperse payment from victims. The war also may have caused some chaos within ransomware syndicates and fomented fears that cyberinsurers would not pay for claims. As a result, we’ve seen a slowing of ransomware incidents in the first quarter, but that may not last.

Key Takeaway: While ransomware attacks may be down short-term, no one should be lulled into thinking the threat is gone, especially with government agencies on high alert and warnings from the highest levels that businesses should still be on guard.

2. Long-term Socioeconomic Impacts Could Trigger a New Wave of Cybercrime

As part of their ongoing analysis, cyber security consultants Coveware, illustrated how the socioeconomic precarity caused by sanctions could lead to a larger number of people turning to cybercrime as a way to support themselves. In their reporting, they analyzed the number of trained cyber security professionals who they’d expect to be out of work given Russia’s rising unemployment rate in order to estimate a pool of potential new ransomware operators. To double the number of individuals currently acting as ransomware operators, they found that only 7% of the newly unemployed workforce would have to convert to cybercrime.

They note, however, that it remains to be seen what impact a larger labor pool would have since new entrants looking for fast cash may not be as willing to put in the time and effort to carry out big game tactics that typified the first half of 2021. As such, Coveware would expect to see an increase in attacks on small to medium-sized enterprises (which already make up the largest portion of ransomware victims today) and a decline in ransom demands with new operators hoping to make paying up more attractive for victims.

Key Takeaway: If the threat materializes, new entrants to the ransomware game are likely to try to fly under the radar, which means we would expect to see a larger number of small to medium-sized businesses targeted with ransoms that won’t make headlines, but that nonetheless hurt the businesses affected.

3. One Ransomware Operator Paid the Price for Russian Allegiance; Others Declared Neutrality

In February, ransomware group Conti declared their support for Russian actions and threatened to retaliate against Western entities targeting Russian infrastructure. But Conti appears to have miscalculated the loyalty of its affiliates, many of whom are likely pro-Ukraine. The declaration backfired when one of their affiliates leaked chat logs following the announcement. Shortly after, LockBit, another prolific ransomware group, took a cue from Conti’s blunder, declaring neutrality and swearing off any attacks against Russia’s many enemies. Their reasoning? Surprisingly inclusive for an organized crime syndicate:

“Our community consists of many nationalities of the world, most of our pentesters are from the CIS including Russians and Ukrainians, but we also have Americans, Englishmen, Chinese, French, Arabs, Jews, and many others in our team… We are all simple and peaceful people, we are all Earthlings.”

As we know, the ransomware economy is a wide, interconnected network of actors with varying political allegiances. The actions of LockBit may assuage some fears that Russia would be able to weaponize the cybercrime groups that have been allowed to operate with impunity within its borders, but that’s no reason to rest easy.

Key Takeaway: LockBit’s actions and words reinforce the one thing we know for sure about cybercriminals: Despite varying political allegiances, they’re unified by money and they will come after it if it’s easy for the taking.

4. CISA Reports the Globalized Threat of Ransomware Increased in 2021

The U.S. Cybersecurity and Infrastructure Security Agency (CISA) released a statement in March summarizing the trends they saw throughout 2021. They outlined a number of tactics that we saw throughout the year as well, including:

Targeting attacks on holidays and weekends.
Targeting managed service providers.
Targeting backups stored in on-premises devices and in the cloud.

Among others, these tactics pose a threat to critical infrastructure, healthcare, financial institutions, education, businesses, and nonprofits globally.

Key Takeaway: The advisory outlines 18 mitigation strategies businesses and organizations can take to protect themselves from ransomware, including some of the top strategies as we see it: protecting cloud storage by backing up to multiple locations, requiring MFA for access, and encrypting data in the cloud.

5. Russia Could Use Ransomware to Offset Sanctions

Despite our first observation that ransomware attacks slowed somewhat early in the quarter, the Financial Crimes Enforcement Network (FinCEN) issued an alert in March that Russia may employ state-sponsored actors to evade sanctions and bring in cryptocurrency by ramping up attacks. They warned financial institutions, specifically, to be vigilant against these threats to help thwart attempts by state-sponsored Russian actors to extort ransomware payments.

The warnings follow an increase in phishing and distributed denial-of-service (DDoS) attacks that have persisted throughout the year and increased toward the end of February into March as reported by Google’s Threat Analysis Group. In reports from ThreatPost covering the alert as well as Google’s observations, cybersecurity experts seemed doubtful that ransomware payouts would make much of a dent in alleviating the sanctions, and noted that opportunities to use ransomware were more likely on an individual level.

Key Takeaway: The warnings serve as a reminder that both individual actors and state-sponsored entities have ransomware tools at their disposal to use as a means to retaliate against sanctions or simply support themselves, and that the best course of action is to shore up defenses before the anticipated threats materialize.

What This All Means for You

The changing political landscape will continue to shape the ransomware economy in new and unexpected ways. Being better prepared to avoid or mitigate the effects of ransomware makes more and more sense when you can’t be sure what to expect. Ransomware protection doesn’t have to be costly or confusing. Check out our ransomware protection solutions to get started.

The post Ransomware Takeaways From Q1 2022 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcing Backblaze B2’s Universal Data Migration

2022-04-13 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/announcing-backblaze-b2-universal-data-migration/

Your data is valuable. Whether you’re sequencing genomes, managing a media powerhouse, or running your own business, you need fast, affordable, ready access to it in order to achieve your goals. But, you can’t get the most out of your data if it’s locked-in to a provider where it’s hard to manage or time-consuming to retrieve. Unfortunately, due to egress fees and closed, “all-in-one” platforms, vendor lock-in is currently trapping too many companies.

Backblaze can help: Universal Data Migration, a new service launched today, covers all data transfer costs, including legacy provider egress fees, and manages data migration from any legacy on-premises or cloud source. In short, your migration to Backblaze B2 Cloud Storage is on us.

Many of the businesses we’ve spoken to about this didn’t believe that the service was free at first. But seriously—you will never see an invoice for your transfer fees, any egress fees levied by your legacy vendor when you pull data out, or for the assistance in moving data.

If you’re still in doubt, read on to learn more about how Universal Data Migration can help you say goodbye to vendor lock-in, cold delays, and escalating storage costs, and hello to B2 Cloud Storage gains, all without fears of cost, complexity, downtime, or data loss.

How Does Universal Data Migration Work?

Backblaze has curated a set of integrated services to handle migrations from pretty much every source, including:

Public cloud storage
Servers
Network attached storage (NAS)
Storage area networks (SAN)
Tape/LTO solutions
Cloud drives

We cover data transfer and egress costs and facilitate the migration to Backblaze B2 Cloud Storage. The turnkey service expands on our earlier Cloud to Cloud Migration services as well as transfers via internet and the Backblaze Fireball rapid ingest devices. These offerings are now rolled up into one universal service.

“I thought moving my files would be the hardest part of the process, and it’s why I never really thought about switching providers before, but it was easy.”
—Tristan Pelligrino, Co-founder, Motion

We do ask that companies who use the service commit to maintaining at least 10TB in Backblaze B2 for a minimum of one year, but we expect that our cloud storage pricing—a quarter the cost of comparable services—and our interoperability with other cloud services, will keep new customers happy for that first year and beyond.

Outside of specifics that will vary by your unique infrastructure and workflows, migration types include:

Cloud to cloud: Reads from public cloud storage or a cloud drive (e.g., Amazon S3 or Google Drive) and writes to Backblaze B2 via inter-cloud bandwidth.
On-premises to cloud: Reads from a server, NAS, or SAN and writes to Backblaze B2 over optimized cloud pipes or via Backblaze’s 96TB Fireball rapid ingest device.
LTO/tape to cloud: Reads tape media, from reel cassettes to cartridges and more, and writes to Backblaze B2 via a high-speed, direct connection.

Backblaze also supports simple internet transfers for moving files over your existing bandwidth—with multi-threading to maximize speed.

How Much Does Universal Data Migration Cost?

Not to sound like a broken record, but this is the best part—the service is entirely free to you. You’ll never receive a bill. Backblaze incurs all data transfer and legacy vendor egress or download fees for inbound migrations >10TB with a one-year commitment. It’s pretty cool that we can help save you money; it’s even cooler that we can help more businesses build the tech stacks they want using unconflicted providers to truly get the most out of their data.

Fortune Media Reduces Storage Costs by Two-thirds With Universal Data Migration

After divesting from its parent company, Fortune Media rebuilt its technology infrastructure and moved many services, including data storage, to the cloud. However, the initial tech stack was expensive, difficult to use, and not 100% reliable.

Backblaze B2 offered a more reliable and cost-effective solution for both hot cloud storage and archiving. In addition, the platform’s ease of use would give Fortune’s geographically-dispersed video editors a modern, self-service experience, and it was easier for the IT team to manage.

Using Backblaze’s Cloud to Cloud Migration, now part of Universal Data Migration, the team transferred over 300TB of data from their legacy provider in less than a week with zero downtime, business disruption, or egress costs, and was able to cut overall storage costs by two-thirds.

“In the cloud space, the biggest complaint that we hear from clients is the cost of egress and storage. With Backblaze, we saved money on the migration, but also overall on the storage and the potential future egress of this data.”
—Tom Kehn, Senior Solutions Architect at CHESA, Fortune Media’s technology systems integrator

Even More Benefits

What else do you get with Universal Data Migration? Additional benefits include:

Truly universal migrations: Secure data mobility from practically any source.
Support along the way: Simple, turnkey services with solution engineer support to help ensure easy success.
Safe and speedy transfers: Proven to securely transfer millions of objects and petabytes of data, often in just days.

Ready to Get Started?

The Universal Data Migration service is generally available now. To qualify, organizations must migrate and commit to maintaining at least 10TB in Backblaze B2 for a minimum of one year. For more information or to set up a free proof of concept, contact the Backblaze Sales team.

The post Announcing Backblaze B2’s Universal Data Migration appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Partner API: The Details

2022-04-11 Elton Carneiro

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/backblaze-partner-api-the-details/

Last week, we announced enhancements to our partner program that make working with Backblaze even easier for current and future partners. We shared a deep dive into the first new offering—Backblaze B2 Reserve—on Friday, and today, we’re digging into another key element: the Backblaze Partner API. The Backblaze Partner API enables independent software vendors (ISVs) participating in Backblaze’s Alliance Partner program to add Backblaze B2 Cloud Storage as a seamless backend extension within their own platform.

Read on to learn more about the Backblaze Partner API and what it means for existing and potential Alliance Partners.

What Is the Backblaze Partner API?

With the Backblaze Partner API, ISVs participating in Backblaze’s Alliance Partner program can programmatically provision accounts, run reports, and create a bundled solution or managed service which employs B2 Cloud Storage on the back end while delivering a unified experience to their users.

By unlocking an improved customer experience, the Partner API allows Alliance Partners to build additional cloud services into their product portfolio to generate new revenue streams and/or grow existing margin.

Why Develop the Backblaze Partner API?

We heard frequently from our existing partners that they wanted to provide a more seamless experience for their customers when it came to offering a cloud storage tier. Specifically, they wanted to keep customers on their site rather than requiring them to go elsewhere as part of the sign up experience. We built the Partner API to deliver this enhanced customer experience while also helping our partners extend their services and expand their offerings.

“Our customers produce thousands of hours of content daily, and, with the shift to leveraging cloud services like ours, they need a place to store both their original and transcoded files. The Backblaze Partner API allows us to expand our cloud services and eliminate complexity for our customers—giving them time to focus on their business needs, while we focus on innovations that drive more value.”
—Murad Mordukhay, CEO, Qencode

What Does the Partner API Do, Specifically?

To create the Backblaze Partner API, we exposed existing functionality to allow partners to automate tasks like creating and ejecting member accounts, managing Groups, and leveraging system-generated reports to get granular billing and usage information—outlining user tasks individually so users can be billed more accurately for what they’ve used.

The API calls are:

Account creation (adding Group members).
Organizing accounts in Groups.
Listing Groups.
Listing Group members.
Ejecting Group members.

Once the Partner API is configured, developers can use the Backblaze S3 Compatible API or the Backblaze B2 Native API to manage Group members’ Backblaze B2 accounts, including: uploading, downloading, and deleting files, as well as creating and managing the buckets that hold files.

How to Get Started With the Backblaze Partner API

If you’re familiar with Backblaze, getting started is straightforward:

Create a Backblaze account.
Enable Business Groups and B2 Cloud Storage.
Contact Sales for access to the API.
Create a Group.
Create an Application Key and set up Partner API calls.

Check out our documentation for more detailed information on getting started with the Backblaze Partner API. You can also reach out to us via email at any time to schedule a meeting to discuss how the Backblaze Partner API can help you create an easier customer experience.

The post Backblaze Partner API: The Details appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze B2 Reserve: The Details

2022-04-08 Elton Carneiro

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/backblaze-b2-reserve-the-details/

Yesterday, we announced enhancements to our partner program that make working with Backblaze even easier for current and prospective partners. Today, we’re digging into a key offering from that program: Backblaze B2 Reserve. Backblaze B2 Reserve brings more value to the Backblaze community of Channel Partners and opens up our easy, affordable cloud storage to many more.

Read on to learn more about Backblaze B2 Reserve and what it means for existing and potential Channel Partners.

What Is Backblaze B2 Reserve?

Predictable, affordable pricing is our calling card, but for a long time our Channel Partners have had a harder time than other customers when it came to accessing this value. Backblaze B2 Reserve brings them a capacity-based, annualized SKU which works seamlessly with channel billing models. The offering also provides seller incentives, Tera-grade support, and expanded migration services to empower the channel’s acceleration of cloud storage adoption and revenue growth.

Why Launch Backblaze B2 Reserve?

Short story, we heard a lot of feedback from our partners about how much they loved working with Backblaze B2 Cloud Storage, except for the service’s pricing model—which limited their ability to promote it to customers. Backblaze B2 is charged on a consumption-based model, meaning you only pay for what you use. This works great for many of our customers who value pay-as-you-go pricing, but not as well for those who value fixed, predictable, monthly or annual bills.

Customers who are more accustomed to planning for storage provisioning want to pay for cloud storage on a capacity-based model similar to how they would for on-premises storage. They buy what they expect to use up front, and their systems and processes are set up to utilize storage in that way. Additionally, the partners who include Backblaze B2 as part of packages they sell to their customers wanted predictable pricing to make things easier in their sales processes.

Backblaze B2 Reserve is a pricing package built to answer these needs—serving the distributors and value-added resellers who want to be able to present B2 Cloud Storage to their current and prospective customers.

How Does Backblaze B2 Reserve Work?

The Backblaze B2 Reserve offering is capacity-based, starting at 20TB, with key features, including:

Free egress up to the amount of storage purchased per month.
Free transaction calls.
Enhanced migration services.
No delete penalties.
Tera support.

A customer can purchase more storage by buying 10TB add ons. If you’re interested in participating or just want to learn more, you can reach out to us via email to schedule a meeting.

How Is Backblaze B2 Reserve Different From Backblaze B2?

The main difference between Backblaze B2 Reserve and Backblaze B2 is the way the service is packaged and sold. Backblaze B2 uses a consumption model—you pay for what you use. Backblaze B2 Reserve uses a capacity model—you pay for a specific amount of storage up front.

“Backblaze’s ease and reliability, paired with their price leadership, has always been attractive, but having their pricing aligned with our business model will bring them into so many more conversations we’re having across the types of customers we work with.”
—Mike Winkelmann, Cinesys-Oceana

Ready to Get Started?

If you’re going to NAB, April 23-27th, we’ll be there, and we’d love to see you—click here to book a meeting. We’ll also be at the Channel Partners Conference, April 11-14th. Otherwise, reach out to us via email to schedule a chat. Let’s talk about how the new program can move your business forward.

The post Backblaze B2 Reserve: The Details appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.