Tag Archives: B2Cloud

The (New) Perks of Being A Backblaze Stock Owner

2022-09-27 James Kisner

Post Syndicated from James Kisner original https://www.backblaze.com/blog/the-new-perks-of-being-a-backblaze-stock-owner/

At Backblaze, we’re deeply thankful to the communities that have helped us learn and grow over the past 15 years. From our customers, to our partners, to all of our blog readers and social followers—Backblaze wouldn’t be the same without these folks. Over the years we’ve been able to thank our community with giveaways, events, and even Storage Pods. Which is why we’re excited to announce a new perks program for one of our newest communities: shareholders.

Backblaze Launches Program to Reward Investors

Since our IPO on November 11 of 2021, we’ve been buoyed by the support and commitment of individual investors and today we’re launching a program to thank them by adding some additional perks to being a BLZE stockholder.

We’ve partnered with Stockperks to launch a program that offers the following benefits to investors:

- For current holders of 1+ shares: You’ll receive a sticker pack to outfit your laptop, car, or any other flat surface with fresh branding from Backblaze.

For 10 shares or more, held for at least 1 month, investors can access one of the following discounted Backblaze products:
- Backblaze Computer Backup: 3 months of free backup for new and existing customers.
- Backblaze B2 Cloud Storage: A $20 Credit for new and existing customers.
For 50 shares or more, held for at least one month: You’ll receive a hat with a custom leather patch of the Backblaze logo on the front and our wordmark on the back.

For details on signing up, redeeming your perks, and other terms and conditions (see the fine print), you can download the Stockperks app here. Once you’ve created a profile, you’ll be able to start exploring the $BLZE perks program and hearing from Backblaze management as we provide investor-related updates about the firm.

About Stockperks

Stockperks is reimagining and revolutionizing how retail investors and companies connect. It’s the first multi-channel marketplace where individual investors get the perks of company ownership, companies create a community of engaged, informed and loyal individual investors, and everyone is invested in the company’s success.

Why Stockperks, and What’s Next?

From day one as a public company, we’ve tried to engage our community as deeply as possible. From inviting customers to participate in our initial public offering, gathering investor questions through the SAY Connect platform prior to earnings calls, and now, this new partnership with Stockperks—our approach reflects our customer-centric approach as a business since we were founded. Growing a public company and investing in public companies can be best viewed as long-term commitments, and our engagement with the community is the same.

We’re also pleased to share that in Q4 of this year we plan to launch our “Stocks and Storage” video blog (vlog). In the tradition of our widely-read blog, our goal with Stocks and Storage vlog is to provide useful, relevant information to our viewership. But while our blog primarily focuses on storage topics, the Stocks and Storage vlog aims to demystify financial topics (what exactly is EBITDA, anyway?) from the perspective of a newly-public technology company in Silicon Valley.

We thank you for your support and interest, and look forward to continuing our journey together on our mission to make storing, using, and protecting that data astonishingly easy.

The post The (New) Perks of Being A Backblaze Stock Owner appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Lights, Camera, Custom Action (Part Two): Inside Integrating Frame.io + Backblaze B2

2022-09-15 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/lights-camera-custom-action-part-two-inside-integrating-frame-io-backblaze-b2/

Part 2 in a series covering the Frame.io/Backblaze B2 integration, covering the implementation. See Part 1 here, which covers the UI.

In Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2, we described a custom action for the Frame.io cloud-based media asset management (MAM) platform. The custom action allows users to export assets and projects from Frame.io to Backblaze B2 Cloud Storage and import them back from Backblaze B2 to Frame.io.

The custom action is implemented as a Node.js web service using the Express framework, and its complete source code is open-sourced under the MIT license in the backblaze-frameio GitHub repository. In this blog entry we’ll focus on how we secured the solution, how we made it deployable anywhere (including to options with free bandwidth), and how you can customize it to your needs.

What is a Custom Action?

Custom Actions are a way for you to build integrations directly into Frame.io as programmable UI components. This enables event-based workflows that can be triggered by users within the app, but controlled by an external system. You create custom actions in the Frame.io Developer Site, specifying a name (shown as a menu item in the Frame.io UI), URL, and Frame.io team, among other properties. The user sees the custom action in the contextual/right-click dropdown menu available on each asset:

When the user selects the custom action menu item, Frame.io sends an HTTP POST request to the custom action URL, containing the asset’s id. For example:

{
  "action_id": "2444cccc-7777-4a11-8ddd-05aa45bb956b",
  "interaction_id": "aafa3qq2-c1f6-4111-92b2-4aa64277c33f",
  "resource": {
    "type": "asset",
    "id": "9q2e5555-3a22-44dd-888a-abbb72c3333b"
  },
  "type": "my.action"
}

The custom action can optionally respond with a JSON description of a form to gather more information from the user. For example, our custom action needs to know whether the user wishes to export or import data, so its response is:

{
  "title": "Import or Export?",
  "description": "Import from Backblaze B2, or export to Backblaze B2?",
  "fields": [
    {
      "type": "select",
      "label": "Import or Export",
      "name": "copytype",
      "options": [
        {
          "name": "Export to Backblaze B2",
          "value": "export"
        },
        {
          "name": "Import from Backblaze B2",
          "value": "import"
        }
      ]
    }
  ]
}

When the user submits the form, Frame.io sends another HTTP POST request to the custom action URL, containing the data entered by the user. The custom action can respond with a form as many times as necessary to gather the data it needs, at which point it responds with a suitable message. For example, when it has all the information it needs to export data, our custom action indicates that an asynchronous job has been initiated:

{
  "title": "Job submitted!",
  "description": "Export job submitted for asset."
}

Securing the Custom Action

When you create a custom action in the Frame.io Developer Tools, a signing key is generated for it. The custom action code uses this key to verify that the request originates from Frame.io.

When Frame.io sends a POST request, it includes the following HTTP headers:

`X-Frameio-Request-Timestamp`	The time the custom action was triggered, in Epoch Epoch timetime (seconds since midnight UTC, Jan 1, 1970).
`X-Frameio-Signature`	The request signature.

The timestamp can be used to prevent replay attacks; Frame.io recommends that custom actions verify that this time is within five minutes of local time. The signature is an HMAC SHA-256 hash secured with the custom action’s signing key—a secret shared exclusively between Frame.io and the custom action. If the custom action is able to correctly verify the HMAC, then we know that the request came from Frame.io (message authentication) and it has not been changed in transit (message integrity).

The process for verifying the signature is:

- Combine the signature version (currently “v0”), timestamp, and request body, separated by colons, into a string to be signed.
- Compute the HMAC SHA256 signature using the signing key.
- If the computed signature and signature header are not identical, then reject the request.

The custom action’s verify TimestampAndSignature() function implements the above logic, throwing an error if the timestamp is missing, outside the accepted range, or the signature is invalid. In all cases, 403 Forbidden is returned to the caller.

Custom Action Deployment Options

The root directory of the backblaze-frameio GitHub repository contains three directories, comprising two different deployment options and a directory containing common code:

node-docker—generic: Node.js deployment
node-risingcloud: Rising Cloud deployment
backblaze-frameio-common: common code

The node-docker directory contains a generic Node.js implementation suitable for deployment on any Internet-addressable machine–for example, an Optimized Cloud Compute VM on Vultr. The app comprises an Express web service that handles requests from Frame.io, providing form responses to gather information from the user, and a worker task that the web service executes as a separate process to actually copy files between Frame.io and Backblaze B2.

You might be wondering why the web service doesn’t just do the work itself, rather than spinning up a separate process to do so. Well, media projects can contain dozens or even hundreds of files, containing a terabyte or more of data. If the web service were to perform the import or export, it would tie up resources and ultimately be unable to respond to Frame.io. Spinning up a dedicated worker process frees the web service to respond to new requests while the work is being done.

The downside of this approach is that you have to deploy the custom action on a machine capable of handling the peak expected load. The node-risingcloud implementation works identically to the generic Node.js app, but takes advantage of Rising Cloud’s serverless platform to scale elastically. A web service handles the form responses, then starts a task to perform the work. The difference here is that the task isn’t a process on the same machine, but a separate job running in Rising Cloud’s infrastructure. Jobs can be queued and new task instances can be started dynamically in response to rising workloads.

Note that since both Vultr and Rising Cloud are Backblaze Compute Partners, apps deployed on those platforms enjoy zero-cost downloads from Backblaze B2.

Customizing the Custom Action

We published the source code for the custom action to GitHub under the permissive MIT license. You are free to “use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software” as long as you include the copyright notice and MIT permission notice when you do so.

At present, the user must supply the name of a file when importing an asset from Backblaze B2, but it would be straightforward to add code to browse the bucket and allow the user to navigate the file tree. Similarly, it would be straightforward to extend the custom action to allow the user to import a whole tree of files based on a prefix such as raw_footage/2022-09-07. Feel free to adapt the custom action to your needs; we welcome pull requests for fixes and new features!

The post Lights, Camera, Custom Action (Part Two): Inside Integrating Frame.io + Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Quest Integrates Backblaze into Rapid Recovery Version 6.7

2022-09-12 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/quest-integrates-backblaze-into-rapid-recovery-version-6-7/

It’s the classic “tree falls in the woods” scenario: if your company experiences data loss, but your users never feel it, did it even happen? That’s the value proposition our friends over at Quest—an IT platform that solves complex problems with simple software solutions—present with their popular Rapid Recovery tool:

Back up and quickly recover anything — systems, apps, and data — anywhere, whether it’s physical, virtual, or in the cloud. This data recovery software allows you to run without restore, with zero impact on your users, and as if the outage or data loss never happened.

Quest Rapid Recovery Version 6.7 Adds Backblaze B2 in Cloud Tier

As of today—whether you’re a Quest customer or a Backblaze B2 Cloud Storage user—you can combine all this value with the astonishingly easy cloud storage we’re known for here at Backblaze. In Quest’s 6.7 release of Rapid Recovery, navigate to Cloud Accounts in the menu (see screenshot below for menu location), click Add New Account.

Enter a display name, select B2 Cloud Storage, and choose Amazon S3 as the cloud type. Then you just need to enter your Access key (keyID), Secret Key (applicationKey), and your service endpoint URL.

Your data will be safe, useful, and affordable at a quarter of the price of legacy cloud providers. Try it out today or contact our sales team to learn more.

So, What’s Changed?

If you’re already a Quest Rapid Recovery user, you may notice that setup hasn’t changed. What’s changed is actually in the code—Rapid Recovery will now work more seamlessly and more efficiently. Bug fixes have been baked into version 6.7 and their support will be more robust. We love a seamless partnership—and stay tuned for more integrations between Quest and Backblaze in the future!

More About Quest’s Rapid Recovery Tool

If you’re a Backblaze B2 Cloud Storage user who is in the market for a recovery solution for your business, you can dig into the details about Rapid Recovery here. Here’s a brief primer of the solutions capabilities:

Simplify backup and restore: One cloud-based management console allows you to restore lost or damaged systems and data with near-zero recovery time and no impact to users—an advanced, admin-friendly solution.
Address demanding recovery point objectives (RPO): Leverage image-based snapshots for RPOs and reduce risk of data loss and downtime with tracked change blocks to accelerate backups and reduce storage.
Wide application support: Lightning-fast recovery for file servers and applications on both Microsoft Windows and Linux systems gets business-critical applications online to keep your business rolling.
Cloud-based backup, archive, and disaster recovery: (This is where we come in…) Point-and-click cloud connectivity makes for easy replication of application backups for no-stress cloud backup.
Virtual environment protection: Agentless backup and recovery for Microsoft Exchange and SQL databases residing on your virtual machines and low-cost VM-only tiered licensing for on-premises and cloud virtual environments.
Data deduplication and replication: With B2 Cloud storage, you’ll already save upwards of 75% versus other cloud storage solutions, but you can reduce costs further by leveraging built-in compression and deduplication. Nice.

More about Backblaze B2 Cloud Storage

Backblaze B2 Cloud Storage is purpose-built for ease, instant access to files and data, and infinite scalability. Backblaze B2 is priced so users don’t have to choose between what matters and what doesn’t when it comes to backup, archive, data organization, workflow streamlining, and more. Signing up couldn’t be more simple: a few clicks and you’re storing data. The first 10GB is free, and if you need more capacity to run a proof of concept you can talk to our sales team. Otherwise, when you’re ready to store data, you can pay one of two ways:

Our per-byte consumption pricing: Only pay for what you store. It’s $5 TB/month, no hidden delete fees or gotchas. What you see is what you get.
Our B2 Reserve capacity pricing: If you’d like to by predictable blocks of storage, you can work with any of our reseller partners to unlock the following benefits:
- Free egress up to the amount of storage purchased per month.
- Free transaction calls.
- Enhanced migration services.
- No delete penalties.
- Tera support.

The Answer to the Question

You all can debate the philosophical implications of trees falling in woods and the sound they make. But when it comes to Rapid Recovery, it seems like we can guarantee one thing: your users might not hear the data loss when it happens, but you can bet the sight of relief your IT team breathes when they rapidly recover will be audible.

The post Quest Integrates Backblaze into Rapid Recovery Version 6.7 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Rides the Nautilus Data Center Wave

2022-09-08

Post Syndicated from original https://www.backblaze.com/blog/backblaze-rides-the-nautilus-data-center-wave/

On the outside and on the inside, our newest data center (DC) is more than a little different: there are no cooling towers chugging on the roof, no chillers, or coolants at all. No, we’re not doing a drive stats experiment on how well data centers run at 54° Celsius. This data center, owned and developed by Nautilus Data Technologies, is nice and cool inside. Built with a unique mix of proven maritime and industrial water-cooling technologies that use river water to cool everything inside—racks, servers, switches, and people—this new DC is innovative, environmentally awesome, secure, fascinating, and other such words and phrases, all rolled into one. And it just happens to be located on a barge on a river in California.

It’s a unique setup, one that might raise a few eyebrows. It certainly did for us. But once our team dug in, we didn’t just find room for another exabyte of data, we found an extremely resilient data center that supports our durability requirements and decreases our environmental impact of providing you cloud storage. You can do a deep dive into the Nautilus technology on their website, but of course I needed to make my own visit to look into this shiny new tech on my own. What follows is an overview of what I learned: how the technology works and why we decided to make the Nautilus data center part of our cloud storage platform.

Source

Nautilus Data Center Overview

In the Port of Stockton in California, an odd looking barge is moored next to the shore of the San Joaquin River. If you were able to get close enough, you might notice the massive mooring poles the barge is attached to. And if you were a student of such things, you might recognize these mooring poles as having the same rating as the mooring poles whose attached boats and barges survived hurricane Katrina. The barge isn’t going anywhere.

Above deck are the data center halls. Once inside, it feels like, well, a data center—almost. The power distribution units (PDUs) and other power-related equipment hum quietly and racks of servers and networking gear are lined up across the floor, but there are no hot and cold aisles, and no air conditioning grates or ductwork either. Instead the ceiling is lined with an orderly arrangement of pipes carrying water that’s been cooled by the river outside.

Upriver from the data center, water is collected from the river and filtered before running through the heat exchanger that cools water circulating in a closed loop inside the data center. River water never enters the data center hall.

The technology used to collect and filter the water has been used for decades in power plants, submarines, aircraft carriers, and so on. The entire collection system is marine wildlife-friendly and certified by multiple federal and state agencies, commissions, and water boards, including the California Department of Fish and Wildlife. One of the reasons Nautilus chose the Port of Stockton was the truism that, if you can get something certified for operation in the state of California, then you can typically get it certified pretty much anywhere.

Source

Inside the data center, at specific intervals, water supply and return lines run down to the rear door on each rack. The server fans expel hot air through the rear door and the water inside the door removes the heat to deliver cool air into the room. We use ColdLogik Rear Door Coolers to perform the heat exchange process. The closed loop water system is under vacuum—meaning that it’s leak proof, so water will never touch the servers. A nice bit of innovation by the Nautilus designers and engineers.

Downriver from the data center, the water is discharged. The water can be up to 4° Fahrenheit warmer than when it started upriver. As we mentioned before, the various federal and state authorities worked with Nautilus engineers to select a discharge location which was marine wildlife-friendly. Within seconds of being discharged the water is back to river temperature and continues its journey to the Sacramento Delta. The water spends less than 15 seconds end-to-end in the system which operates with no additional water, uses no chemicals, and adds zero pollutants to the river.

Why Nautilus

For Backblaze, the process of choosing a data center location is a bit more rigorous than throwing a dart at a map and putting some servers there. Our due diligence checklist is long and thorough, taking into consideration redundancy, capacity, scalability, cost, network providers, power providers, stability of the data center owner, and so on. The Nautilus facility passed all of our tests and will enable us to store over an exabyte of data on-site to optimize our operational scalability. In addition, the Nautilus relationship brings us a few additional benefits not traditionally heard of when talking about data centers.

Innovation

Storage Pods, Drive Farming, Drive Stats, and even Backblaze B2 Cloud Storage are all innovations in their own way as they changed market dynamics or defined a different way to do things. They all have in common the trait of putting together proven ideas and technologies in a new way that adds value to the marketplace. In this case, Nautilus marries proven maritime and industrial water cooling and distribution technologies with a new approach to data center infrastructure. The result is an innovative way to use a precious resource to help meet the ever-increasing demand for data storage. This is the kind of engineering and innovation we admire and respect.

Environmental Awesomeness

We can appreciate the environmental impact of the Nautilus data center from two perspectives. The first is obvious: taking a precious resource, river water, and using it to not only lower the carbon footprint of the data center (Nautilus projects by up to 30%), but to also do so without permanently affecting the resource and ecosystem. That’s awesome. The world has been harnessing the power of Mother Nature for thousands of years, yet doing so responsibly has not always been top-of-mind in the process. In the case of Nautilus, the environmental impact is at the top of their list.

The second reason this is awesome is that Nautilus chose to do this in California, coming face-to-face with probably the most stringent environmental requirements in the United States. Almost anywhere else would have been easier, but if you are looking to show your environmental credibility and commitment, then California is the place to start. We commend them for their effort.

Unique Security

Like any well-run data center site, Nautilus has a multitude of industry standard security practices in place: a 24x7x365 security staff, cameras, biometric access, and so on. But the security doesn’t stop there. Being a data center on a barge also means that divers regularly inspect the underwater systems and the barge itself for maintenance and security purposes. In addition, by nature of being a data center on a barge in the Port of Stockton, the data center has additional security: the port itself is protected by the U.S. Department of Homeland Security (DHS) and the waterways are patrolled by the U.S. Coast Guard. This enhanced collection of protective resources is unique for data centers in the U.S., except possibly the kind of data centers we are not supposed to know anything about.

The Manatee in the River

Let’s get to the elephant in the room here: is there risk in putting a data center on a barge in a river? Yes—but no more so than putting one in a desert, or near any body of water, or near a forest, or in an abandoned mine, or near a mountain, or in a city. You get the idea: they all have some level of risk. We’d argue that this new data center—with its decreased reliance on energy and air conditioning and its protection by DHS, among other positives—is quite a bit more reliable than most places the world stores its data. As always, though, we continue to encourage folks to have their data in multiple places.

Still, putting a data center on a river is novel. We’re sure some people will make jokes, and probably pretty funny ones—we’re happy to laugh at our own expense. (It’s certainly happened before.) We are also sure some competitors will use this as part of their sales and marketing—FUD (fear, uncertainty and doubt) as it is called behind your back. We don’t play that game, and, as with our past innovations, we’re used to people sniping a bit when we move out ahead on technology. As always, we encourage you to dig in, get the facts, and be comfortable with the choice you make. Here at Backblaze, we won’t sell you up the river, but we may put your data there.

The post Backblaze Rides the Nautilus Data Center Wave appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2

2022-09-06 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/lights-camera-custom-action-integrating-frame-io-with-backblaze-b2/

At Backblaze, we love hearing from our customers about their unique and varied storage needs. Our media and entertainment customers have some of the most interesting use cases and often tell us about their workflow needs moving assets at every stage of the process, from camera to post-production and everywhere in between.

The desire to have more flexibility controlling data movement in their media management systems is a consistent theme. In the interest of helping customers with not just storing their data, but using their data, today we are publishing a new open-source custom integration we have created for Frame.io. Read on to learn more about how to use Frame.io to streamline your media workflows.

What is Frame.io?

Frame.io, an Adobe company, has built a cloud-based media asset management (MAM) platform allowing creative professionals to collaborate at every step of the video production process. For example, videographers can upload footage from the set after each take; editors can work with proxy files transcoded by Frame.io to speed the editing process; and production staff can share sound reports, camera logs, and files like Color Decision Lists.

The Backblaze B2 Custom Action for Frame.io

Creative professionals who use Frame.io know that it can be a powerful tool for content collaboration. Many of those customers also leverage Backblaze B2 for long-term archive, and often already have large asset inventories in Backblaze B2 as well.

What our Backblaze B2 Custom Action for Frame.io does is quite simple: it allows you to quickly move data between Backblaze B2 and Frame.io. Media professionals can use the action to export selected assets or whole projects from Frame.io to B2 Cloud Storage, and then later import exported assets and projects from B2 Cloud Storage back to Frame.io.

How to Use the Backblaze B2 Custom Action for Frame.io

Let’s take a quick look at how to use the custom action:

As you can see, after enabling the Custom Action, a new option appears in the asset context dropdown. Once you select the action, you are presented with a dialog to select Import or Export of data:

After selecting Export, you can choose whether you want just the single selected asset, or the entire project sent to Backblaze B2.

Once you make a selection, that’s it! The custom action handles the movement for you behind the scenes. The export is a point-in-time snapshot of the data from Frame.io—which remains as it was—to Backblaze B2.

The Custom Action creates a new exports folder in your B2 bucket, and then uploads the asset(s) to the folder. If you opt to upload the entire Project, it will be structured the same way it is organized in Frame.io.

How to Get Started With Backblaze B2 and Frame.io

To get started using the Custom Action described above, you will need:

A Frame.io account.
Access to a compute resource to run the custom action code.
A Backblaze B2 account.

If you don’t have a Backblaze B2 account yet, you can sign up here and get 10GB free, or contact us here to run a proof of concept with more than 10GB.

What’s Next?

We’ve written previously about similar open-sourced custom integrations for other tools, and by releasing this one we are continuing in that same spirit. If you are interested in learning more about this integration, you can jump straight to the source code on GitHub.

Watch this space for a follow-up post diving into more of the technical details. We’ll discuss how we secured the solution, made it deployable anywhere (including to options with free bandwidth), and how you can customize it to your needs.

We would love to hear your feedback on this integration, and also any other integrations you would like to see from Backblaze. Feel free to reach out to us in the comments below or through our social channels. We’re particularly active on Twitter and Reddit—let’s chat!

The post Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Five Misconceptions About Moving From Tape to Cloud

2022-09-02 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/five-misconceptions-about-moving-from-tape-to-cloud/

There are a lot of pros and cons that go along with using the old, reliable LTO system for your backups. And while the medium still has many adherents, there is a growing movement of folks looking to move beyond magnetic tape, a form of storage technology that has been around since 1928. Technically, it’s the same age as sliced bread.

Those working in IT already know the benefits of migrating from LTO to cloud storage, which include everything from nearly universal ease of access to reduced maintenance, but those who hold the company’s pursestrings might still need convincing. Some organizations delay making a move because of misconceptions about the cost, inconvenience, risk, and security, but they may not have all the details. Let’s explore five top misconceptions about migrating from tape so you can help your team make an informed decision.

Misconception #1 – Total Cost of Ownership is Higher in the Cloud

The first misconception is that moving from a tape-based backup solution to cloud storage is expensive. Using our LTO vs. B2 Cloud Storage calculator, you can enter the amount of existing data you have, the amount of data you add yearly, and your daily incremental data to determine the actual cost savings.

For example, say you have 50TB of existing data, you add 5TB more every year, and your daily incremental backup data is 500GB. If that were the case, you could expect to pay almost 75% less backing up with cloud storage versus tape. The calculator also includes details about the assumptions we used in the computations so you can adjust accordingly. These assumptions include the LTO Backup Model, Data Compression Ratio and Data Retention Policy, as well as a handful of others you can dig into on your own if you’d like to fine tune the math.

Misconception #2 – Migration Costs are Impossible to Manage

We have shown how much more affordable it is to store on the cloud vs. on tape, but what about the costs of moving all of your data? Everyone with a frequently accessed data archive and especially those serving data to end users live in fear of large egress fees. Understandably the idea of paying egress fees for ALL of their data at once can be paralyzing. There is one service available today that pays for your data migration—egress fees, transfer costs, administration, all of it.

The new Universal Data Migration (UDM) service covers data migration fees for customers in US, Canada, Europe storing more than 10TB—including any legacy provider egress fees. The service offers a suite of tools and resources to make moving your data over to cloud storage a breeze, including high speed processes for reading tape media (reel cassettes and cartridges) and transferring directly to Backblaze B2 via a high-speed data connection. This all comes with full solution engineer support throughout the process and beyond. Data is transferred quickly and securely within days, not weeks.

Short story: Even if it might feel like it some days, your data does not have to be held hostage by egress expenses. Migration can be the opposite of a “killer”–it can open your budget for other investments and free your teams to access the data they need whenever they need it.

Misconception #3 – Cloud Storage Is a Security Risk

A topic on everyone’s minds these days is security. It’s reasonable to worry about risks when transitioning from tapes stored on-premises or off-site to the cloud. You can see the tapes on site; they’re disconnected from the internet and locked in a storage space on your property. When it comes to cybercriminals accessing data, you’re breathing easy. Statistics on data breaches and ransomware show that businesses of every size are at risk when it comes to cyberattacks, so this is an understandable stance. But when you look at the big picture, the cloud can offer greater peace of mind across a wide range of risks:

Cut Risk by Tiering Data Off Site: Cybercrime is certainly a huge threat, so it’s wise to keep it front of mind in your planning. There are a number of other risk factors that deserve equal consideration, however. Whether you live in an area prone to natural disasters, are headquartered in an older building, or just have bad luck, getting a copy of your data offsite is essential to ensuring you can recover from most disasters.
Apply Object Lock for Virtual Air Gapping: Air gaps used to be the big divider between cloud and tape on the security front. But setting immutability through Object Lock means you can set a virtual air gap on all of your cloud data. This functionality is available through Veeam, MSP 360, and a number of other leading backup management software providers. You don’t have to rely on tape to attain object lock.
Boost Security without Burdening IT: Cloud storage providers’ full time job is maintaining the durability of the data they hold—they focus 24/7 on maintenance and upkeep so you don’t have to worry about whether your hardware and software are up to date and properly maintained. No need to sweat security updates, patches, or dealing with alerts. That’s your provider’s problem.

Misconception #4 – It’s All or Nothing with Data Migration

For certain industries, regulations require that certain data sets stay on-site. In the past, managing some data on-site and some in the cloud was just too much of a hassle. But hybrid services have come a long way toward making the process smoother and more efficient.

For all of your data that doesn’t have to stay on-site, you could start using cloud storage for daily incremental backups today, while keeping your tape system in place for older archived data. Not only would this save you time not worrying about as many tapes, but you can also restore the cloud-based files instantly if you need to.

Using software from StarWind VTL or Archiware P5, you can start backing up to the cloud instantly and make the job of migrating more manageable.

The Hybrid Approach

If you’re not able to go in on the all-in cloud approach right away, you may want to continue to keep some archived data on tape and move over any current data that is more critical. A hybrid system gives you options and allows you to make the transition on your schedule.

Some of the ways companies execute the hybrid model are:

Date Hybrid: Pick a cut-off date; everything after that date is stored in cloud storage and everything before stays on tape.
Classic Hybrid: Full backups remain on tape and incremental data is stored in the cloud.
Type Hybrid: You might store different data types on tape and other types in the cloud. For example, perhaps you store employee files on tape and customer data in cloud storage.

Regardless of how you choose to break it up, the hybrid model makes it faster and easier to migrate.

Misconception #5 – The Costs Outweigh the Benefits

If you’re going to go through the process of migrating your data from LTO to the cloud—even though we’ve shown it to be fairly painless—you want to make sure there’s an upside, right?

Let’s start with the simple ease of access. With tape storage, the nature of physical media means that access is limited by its nature. You have to be on premises to locate the data you need (no small feat if you have a catalog of tapes to sort through).

By putting all that data in the cloud, you enable instant access to anyone in your organization with the right provisions. This shifts hours of burden from your IT department, helping the organization get more out of the resources and infrastructure they already have.

Bonus Pro-Tip: Use a “Cheat Sheet” or Checklist to Convince Your CFO or COO

When you pitch the idea of migrating from tape to cloud storage to your CFO or COO, you can allay their fears by presenting them with a cheat sheet or checklist that proactively addresses any concerns they might have.

Some things to include in your cheat sheet are basically what we’ve outlined above: First, that cloud storage is not more expensive than tape; it actually saves you money. Second, using a hybrid model, you can move your data over conveniently on your own time. There is no cost to you to migrate your data using our UDM service, and your data is fully protected against loss and secured by Object Lock to keep it safe and sound in the cloud.

Migration Success Stories

Check out these tape migration success stories to help you decide if this solution is right for you.

Kings County, CA

Kings County, California, experienced a natural disaster destroying their tapes and tape drive, prompting an $80,000 price tag to continue backing up critical county data like HIPAA records and legal information. John Devlin, CIO of Kings County, decided it was time for a change. His plan was to move away from capital expenditures (tapes and tape drives) to operating expenses like cloud storage and backup software. After much debate, Kings County decided on Veeam Software paired with Backblaze B2 Cloud Storage for its backup solution, and it’s been smooth sailing ever since!

Austin City Limits

Austin City Limits is a public TV program that has stored more than 4,000 hours of priceless live music performances on tape. As those tapes were rapidly beginning to deteriorate, the company opted to transfer recordings to Backblaze B2 Cloud Storage for immediate and ongoing archiving with real-time, hot access. Utilizing a Backblaze Fireball rapid data ingest tool, they were able to securely back up hours of footage without tying up bandwidth. Thanks to their quick actions, irreplaceable performances from Johnny Cash, Stevie Ray Vaughan and The Foo Fighters are now preserved for posterity.

In Summary

So, we’ve covered that moving your backups to a storage cloud can save your organization time and money, is a fairly painless process to undertake, doesn’t present a higher security risk, and creates important geo-redundancies that represent best practices. Hopefully, we’ve helped clear up those misconceptions and we’ve helped you decide whether migrating from tape to cloud storage makes sense for your business.

The post Five Misconceptions About Moving From Tape to Cloud appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Media Workflowing to Europe: IBC 2022 in Amsterdam Preview

2022-08-31 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/media-workflowing-to-europe-ibc-2022-in-amsterdam-preview/

You can send media in milliseconds to just about every corner of the earth with an origin store at your favorite cloud storage company and a snappy content delivery network (CDN). Sadly, delivering people to Europe is a touch more complicated and time intensive. Nevertheless, the Backblaze team is saddling up planes, trains, and automobiles to bring the latest on media workflows to the attendees of IBC 2022. Whether you’re there in person or virtually, we’ll be discussing and demo-ing all the newest Backblaze B2 Cloud Storage solutions that will ensure your data can travel with ease—no mass transit needed—everywhere you need it to be.

Learn More LIVE in Amsterdam

If you’re attending the IBC 2022 conference in Amsterdam, join us at stand 7.B06 to learn about integrating B2 Cloud Storage into your workflow. Stop by anytime or you can schedule a meeting here. We’d love to see you.

IBC 2022 Preview: What’s New for Backblaze B2 Media Workflow Solutions

Our stand will have all the usual goodness: partners, friendly faces, spots to take a load off and talk about making your data work harder, and, of course, some next-level SWAG. Let’s get into what you can expect.

New Pricing Models and Migration Tools

Our team is on hand to talk you through two new offerings that have been generating a lot of excitement among teams across media organizations:

Backblaze B2 Reserve: You can now purchase the Backblaze service many know and love in capacity-based bundles through resellers. If your team needs 100% budget predictability and would like waived transaction fees and premium support included as well, you should check out this new pricing model. Check it out here.
Universal Data Migration: This IBC 2022 Best of Show nominee makes it easy and FREE to move data into Backblaze from legacy cloud, on-premises, and LTO/tape origins. If your current data storage is holding your team or your budget back, we’ll pay to free your media and move it to B2 Cloud Storage. Learn more here.

Six Flavors of Media Workflow Deep Dives

We’ve gathered materials and expertise to discuss or demo our six most asked about workflow improvements. We’re happy to talk about many other tools and improvements, but here are the six areas we expect to talk about the most:

Moving more (or all) media production to the cloud. Ensuring everyone—clients, collaborators, employers, everyone—has easy real-time access to content is essential for the inevitable geographical distribution of modern media workflows.
Reducing costs. Cloud workflows don’t need to come with costly gotchas, minimum retention penalties, and/or high costs when you actually want to use your content. We’ll explain how the right partners will unlock your budget so you can save on cloud services and spend more on creative projects.
Streamlining delivery. Pairing cloud storage with the right CDN is essential to make sure your media is consumable and monetizable at the edge. From streaming services to ecommerce outlets to legacy media outlets, we’ve helped every type of media organization do more with their content.
Freeing storage. Empty your expensive on-prem storage and stop adding HDs and tapes to the pile by moving finished projects to always-hot cloud storage. This doesn’t just free up space and money: Instantly accessible archives means you can work with and monetize older content with little friction in your creative process.
Safeguarding content. All those tapes or HDs on a shelf, in the closet, or wherever you keep them are hard to manage and harder to access and use. Parking everything safely and securely in the cloud means all that data is centrally accessible, protected, and available for more use.
Backing up (better!). Yes, we’ve got roots in backup going back >15 years—so when it comes to making sure your precious media is protected with easy access for speedy recovery, we’ve got a few thoughts (and solutions).

Partners, Partners, and More Partners…

“The more we get together, the happier we’ll be,” might as well be the theme lyric of cloud workflows. Combining best of breed platforms unlocks better value and functionality, and offers you the ability to build your cloud stack exactly how you need it for your business. We’ve got a massive ecosystem of integration partners to bring to bear on your challenges, and we’re happy to share our IBC 2022 stand with two incredible members of that group: media management and collaboration company iconik and the cloud NAS platform LucidLink.

We’ll be demoing a brand new, free Backblaze B2 Storage Plug-in for iconik which enables users of Backblaze, iconik, and LucidLink to move files between services in just a click–we’d love to walk you through it.

Hoping We Can Help You Soon

Whether it’s in person at IBC 2022 or virtually when it works for you, we’d love to walk you through any of the solutions we can serve for hardworking media teams. If you will be in Amsterdam, schedule a meeting to ensure you’ll get the right expert on our team, then stick around for the swag and good times. If you’re not making the trip, please reach out to to us here where we can share all of the same information.

The post Media Workflowing to Europe: IBC 2022 in Amsterdam Preview appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

ELEMENTS Media Platform Adds Backblaze B2 as Native Cloud Option

2022-08-30 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/elements-media-platform-adds-backblaze-b2-as-native-cloud-option/

Cloud workflows are rapidly becoming a driver of every modern media team’s day-to-day creative output. Whether it’s enabling remote shoots, distributed teams, or leveraging budgets more effectively, the cloud can deliver a ton of value to any team. But workflows, by their nature, are complex—and plenty of legacy cloud solutions only add to that complexity with tangled pricing models and limits to egress and collaboration.

ELEMENTS has been simplifying media workflows for more than a decade. Cloud storage has always been part of the ELEMENTS DNA, but they’ll be presenting revolutionary platform updates for cloud workflows in 2023 and beyond at IBC 2022. Part of this new focus on cloud solutions is their addition of an easy, transparent cloud storage option to their platform in Backblaze B2 Cloud Storage. Nimble post-production teams are always on the lookout for more straightforward and easy-to-understand cloud plans with transparent costs—this is a market that Backblaze serves more effectively than the other legacy providers accessible through the platform.

Learn More About This Solution Live in Amsterdam

If you’re attending the 2022 IBC Conference in Amsterdam, join us at stand 7.B06 to learn about integrating B2 Cloud Storage into your workflow. You can schedule a meeting here.

The Backblaze + ELEMENTS Integration

The ELEMENTS platform makes it simple to upload and download files straight from on-premises storage while also offering smart and fully customisable archiving workflows, cloud-based media asset management (MAM), and a number of other tools and features that remove the borders between cloud and on-premises storage. Once connected, ELEMENTS enables users to search, edit, or automate changes to media assets. This extends to team collaboration and setting rights to folders and data across the connected networks. ELEMENTS has provided an intuitive interface making this an easy-to-use solution that is designed for the M&E industry.

Connecting your Backblaze account with ELEMENTS is easy. Simply navigate to the System > Integrations menu and enter your Backblaze login credentials. After this, Backblaze B2 Buckets of the connected account can be mounted as a volume on ELEMENTS.

If you’d like to run a proof of concept with Backblaze, the first 10 GB is free and setting up a Backblaze account only takes a few clicks. Or you can contact sales for more information.

If you’re already a Backblaze B2 customer and would like to check out the ELEMENTS platform, contact ELEMENTS directly here.

Simplifying Your Workflow AND Your Budget

Backblaze focuses on end-to-end ease, including how it works in your budget. Businesses can select a pay-as-you go option or work with a reseller to access capacity plans.

B2 Cloud Storage – This is a general cloud plan for applications, backups and almost all of your business needs. The pricing is simple: $5 per TB per month + $0.01 per GB download fee. As with all plans, the files are located on one storage tier and can always be easily accessed.

B2 Reserve – This is the sweet spot for most media use cases. B2 Reserve is a cloud package starting from 20TB per month. This plan comes at a slightly higher cost than the standard B2 Cloud Storage plan but is free from egress fees up to the amount of storage purchased per month. B2 Reserve will quickly work in your favor if you plan on accessing your files regularly. NOTE: B2 Reserve is only available through resellers.

Top Benefits for Teams Using Backblaze and ELEMENTS Together

The ELEMENTS platform offers a set of robust tools that unlock time and budgets for creative teams to do more. We’ll underline how these different features can work with Backblaze B2.

Automation Engine

The ELEMENTS Automation Engine allows users to create workflows with any number of steps. This tool has a growing list of templates, two of which are Archive and Restore automations. These can be used to archive footage to Backblaze and delete it from the on-premises storage while keeping a lightweight preview proxy. If you need the original footage after previewing the proxy, triggering the Restore automation is all you need to do. The hi-res footage will automatically be downloaded from the Backblaze B2 bucket and placed onto the original location.

A huge benefit of using cloud storage through the ELEMENTS platform is that the individual users do not need to have cloud accounts or direct cloud access. Users will only be able to use the cloud features through the preset automation jobs and according to their permissions.

Media Library

Cloud technologies open up a number of new possibilities within the Media Library, our powerful, browser-based media asset management (MAM) platform.

For example, if your post-production facility has a locally-deployed media library which is running on your ELEMENTS storage and is connected to your Backblaze account, users can playback all of your footage at any time, no matter where it is stored—on-premises, in the cloud, or even in your LTO archive.

The Media Library adds a layer of functionality to the cloud and allows you to easily build a true cloud archive—one that can be accessed from anywhere, in which footage can easily be previewed and just as easily restored with a click of a button.

File Manager

The File Manager is a functionality of the ELEMENTS Web UI that allows you to browse and manage content on your storage on-premises and, very soon, in the cloud. It provides you with a clear overview of all your files, no matter how many file systems and cloud buckets you have. File Managers’ support for cloud storage means users will be able to manage all of their files in one place, without having to navigate through a host of different cloud providers’ interfaces.

ELEMENTS Client

The ELEMENTS Client is an intuitive connection manager that allows admins to decide who gets to mount what—providing a secure gatekeeper to your footage.
The latest function, coming soon to the ELEMENTS Client, will allow users to mount cloud workspaces. This means that users will be able to access the contents of the Backblaze B2 Bucket as if it were a local drive. With optional access logging, users will have the ability to access the cloud-stored content while admins can maintain a high level of security.

Bringing Independent Cloud Storage to the ELEMENTS Platform

Offering B2 Cloud Storage as a native option within the ELEMENTS platform brings a whole new type of cloud offering to Elements’ users. We’re eager to see how creatives use an easier, more affordable, independent option in their workflows.

Learn More About this Solution Live in Amsterdam

If you’re attending the 2022 IBC Conference in Amsterdam, join us at stand 7.B06 to learn about integrating B2 Cloud Storage into your workflow. You can schedule a meeting here.

The post ELEMENTS Media Platform Adds Backblaze B2 as Native Cloud Option appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcing Backblaze B2 Object Lock for MSP360: Enhanced Ransomware Protection

2022-08-25 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/announcing-backblaze-b2-object-lock-for-msp360/

The potential threat of ransomware is something every modern business lives with. It’s a frightening prospect, but it’s a manageable risk with technology that’s readily available on many major platforms. Today, we’re adding one more tool to that list: we’re excited to announce that our long-time partners at MSP360 have now made Backblaze B2 Object Lock functionality available to customers that use B2 Cloud Storage as cloud tier for their backup data.

How Backblaze B2 Object Lock Works with MSP360 Data

Backups are the last line of defense against cyberattacks and accidental data loss. From ransomware attacks to hacking unattended devices, there are plenty of attack vectors available to cybercriminals, not to mention the very real risk of human error. But, when activated by an IT admin in MSP360 Managed Backup 6.0, Backblaze B2 Object Lock provides an additional layer of security to a business’ backups by blocking deletion or modification by anyone (including admins) during a user-defined retention period. Object Lock puts the data in an immutable state—it’s accessible and usable, but it can’t be changed or trashed. For anyone worried about attacks on their last line of defense, this is a huge relief. It’s also increasingly a requirement, as it’s becoming more common to request immutability as proof of compliance for many industries with strict standards.

“Our customers have clients operating in a range of IT environments. With whatever we do, we want to keep that in mind and ensure we provide our customers with options. Offering Backblaze B2 Object Lock to our customers provides them another tool in the fight against ransomware, arguably still cybersecurity’s biggest challenge.”—Brian Helwig, CEO, MSP360

How to Use Object Lock with MSP360 Today

If you’re already using Backblaze B2 as a cloud storage tier for MSP360 and you’re running the latest version, you can choose to enable Object Lock when you create a new bucket. If you’re interested in checking out the joint solution now that Object Lock is enabled, you can learn more here.

“MSP360 has been a long-time partner of Backblaze and continues to impress us with their commitment to delivering a well-rounded platform to customers. We’re very happy to extend Backblaze B2 Object Lock to MSP360’s customers to meet their security, disaster recovery, and cloud storage needs.”—Nilay Patel, Vice President, Sales & Partnerships, Backblaze

Want to Learn More About Object Lock?

Protecting data is one of our favorite things, so, appropriately, we’ve written about the value of Object Lock quite a bit. You can learn more about the basics in this general guide to Object Lock. If you’re interested in how this feature will integrate into your existing security policy, you can read about adding Object Lock to your IT security policy here.

And if you want to hear more from the experts on the subject, register for our webinar, Cybersecurity and the Public Cloud: Cloud Backup Best Practices on September 21. The webinar features John Hill, Cybersecurity Expert; Troy Liljedahl, Director of Solutions Engineering at Backblaze; and David Gugick, VP Product Management at MSP360, and you’ll learn about a few of the common security concerns facing you today as you back up data into the public cloud. If you can’t join us live, the webinar will be available on demand on the Backblaze BrightTALK channel.

We hope these guides can be useful for you, but drop a comment if there’s anything else we can cover.

The post Announcing Backblaze B2 Object Lock for MSP360: Enhanced Ransomware Protection appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Back Up Veeam to the Cloud

2022-08-23 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/how-to-back-up-veeam-to-the-cloud/

Backups are your best defense against ransomware and other types of data loss. Thankfully it is quick and easy to back up all your Veeam data to Backblaze B2 Cloud Storage within minutes—and we have the videos to prove it!

What is Veeam?

Veeam is well-respected backup and disaster recovery software that works across many platforms and hardware/software configurations. Founded in 2006, Veeam Software is a U.S.-based company that operates in over 180 countries and has 400,000 customers—many of them Fortune 500 companies.

The Veeam and Backblaze B2 Cloud Storage Integration

Backblaze has partnered with Veeam to deliver the most reliable, affordable, and secure data protection and cloud storage target for your data. Veeam Backup & Replication provides modern data protection for your cloud, virtual, and physical workloads to solve your challenges around backup, recovery, archive, disaster recovery, and ransomware.

With a transparent pricing model that is a fraction of the competitors’ cost, Backblaze B2 Cloud Storage helps you plan your budget effectively and store more than four times the restore points you could otherwise. With Backblaze B2 as your cloud tier storage destination in Veeam, you can store your data for $5/TB per month with no minimum retention requirement, tiers, or hidden fees.

Additionally, Backblaze is certified as Veeam Ready—Object and Veeam Ready—Object with Immutability. Immutability is an important part of protecting backups from threats such as ransomware or stolen credentials because it allows you to protect objects from being changed, deleted, manipulated, copied, or encrypted for a specified, user-defined time period. Even better, Backblaze does not charge an extra fee for the use of the object lock feature.

How Does Veeam Work with Backblaze B2 Cloud Storage?

Backblaze is a proud partner of Veeam and is fully compatible with Veeam Cloud Tier. Using Backblaze B2’s S3-compatible API, you can set B2 Cloud Storage as your Cloud Tier in Veeam’s Scale-Out Backup Repository.

In Veeam v11 and earlier versions, you must first establish the Performance Tier, or Local Repository, before you can set up the Cloud Tier.

If you’ve been using Veeam, you probably already know how to add a local storage repository to Veeam. However, if you are one of our B2 users who are exploring this partnership for the first time, we have a video to guide you through the process. Watch as Greg Hamer, Senior Developer Evangelist, demonstrates how to set up the Local Repository in just a few minutes. If your Local Repository is already configured, then you’re ready to proceed to cloud backup!

Steps to Back Up Your Data with Veeam and Cloud Storage

To make things easy, we have created a video about How to Back Up Veeam to the Cloud. In the video, Greg demonstrates how you can securely store your Veeam data in just 20 minutes.

If you’re not a visual learner, you can easily back up all of your Veeam data to Backblaze’s B2 Cloud Storage using the five easy steps below.

Step 1: Create a Backblaze Account

First, you need to set up a Backblaze account. If you already have a Backblaze account, you’re all set and can move on to step two. Otherwise, visit Backblaze’s Veeam page and click the Start Now button to create one.

The Start Now button will take you to a simple sign-up form where you only have to enter your email address and a password. Don’t worry about setting up billing just yet. You have 10GB of free space to test drive B2 Cloud Storage before you have to set up any billing information.

Once you successfully create a new account, you will create a bucket to store your data in, then collect and save some information from your Backblaze dashboard to use later.

Step 2: Create a Backblaze B2 Bucket and Set Up an Application Key

A “bucket” is a container that holds your files uploaded by your Veeam software to Backblaze B2 Cloud Storage. When configuring your bucket, you will give it a unique name, choose whether it’s private or public (most customers choose private buckets), and turn on Object Lock to secure your files and make them immutable. (This is an important security step you won’t want to miss.)

Each bucket is associated with a name and an S3 Endpoint. You should jot down this Endpoint to use later in Veeam to connect with Backblaze.

Before you exit the Backblaze console, you will set up an Application Key that allows Veeam to connect to and access your storage bucket securely. You give the Application Key a name and make some additional choices to finish setting it up. Finally, you will jot down some details for the Application Key, such as keyID, keyName, and applicationKey, which is essentially a passcode for the key. Be sure to write these down immediately after creating the key, or you won’t be able to access it in plaintext again.

Step 3: Add Backblaze B2 Cloud Storage as a Cloud Tier Repository in Veeam

Switching over to the Veeam console, you will log into your software and create a Cloud Tier repository to interface with Backblaze B2 Cloud Storage.

Before you do that, however, you need to have a local repository created. The tutorial assumes that you have one already and have been using Veeam to backup locally.

To set up your cloud tier, you will follow a few simple steps:

Choose your object storage type.
Give it a name.
Enter your Backblaze S3 Endpoint value.

You will also be prompted to enter your credentials, which is the Application Key information you’ve already set up when you created your Backblaze B2 Bucket. Before exiting that area, Veeam will test the connection to ensure it can reach your Bucket. The final stage in this step allows you to turn on Object Lock to keep your backup files safe.

Step 4: Create the Scale-Out Backup Repository in Veeam

Still working within the Veeam console, you will also set up a scale-out repository to handle backup data load. During this step, you will name your Veeam Scale-Out repository, choose a few options, select the Cloud Tier repository you just created in step three, and ensure that your files are backed up immediately.

Step 5: Create a Backup Job in Veeam

The final stage of our backup tutorial walks you through the process of setting up a backup job. You will continue working in Veeam to create a new backup job using cloud storage. In the video we show you a Virtual Machine backup, but you can create several other types of backup jobs as needed. You can then name your backup job, add the files you want to backup, and choose where you want to save them (in this case, the Scale-Out repository we just created).

You also have options to optimize storage and schedule your backup job to run as often as you like. Then, you can test it immediately to see how it goes.

We hope this video guide and brief explanation were useful in helping you get the most out of both Veeam and Backblaze. If you have thoughts for topics on future videos, sound off in the comments. And be sure to subscribe to our YouTube channel for more great content!

The post How to Back Up Veeam to the Cloud appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Storing and Querying Analytical Data in Backblaze B2

2022-08-16 Greg Hamer

Post Syndicated from Greg Hamer original https://www.backblaze.com/blog/storing-and-querying-analytical-data-in-backblaze-b2/

Note: This blog is the result of a collaborative effort of the Backblaze Evangelism team members Andy Klein, Pat Patterson and Greg Hamer.

Have You Ever Used Backblaze B2 Cloud Storage for Your Data Analytics?

Backblaze customers find that Backblaze B2 Cloud Storage is optimal for a wide variety of use cases. However, one application that many teams might not yet have tried is using Backblaze B2 for data analytics. You may find that having a highly reliable pre-provisioned storage option like Backblaze B2 Cloud Storage for your data lakes can be a useful and very cost-effective alternative for your data analytic workloads.

This article is an introductory primer on getting started using Backblaze B2 for data analytics that uses our Drive Stats as the example of the data being analyzed. For readers new to data lakes, this article can help you get your own data lake up and going on Backblaze B2 Cloud Storage.

As you probably know, a commonly used technology for data analytics is SQL (Structured Query Language). Most people know SQL from databases. However, SQL can be used against collections of files stored outside of databases, now commonly referred to as data lakes. We will focus here on several options using SQL for analyzing Drive Stats data stored on Backblaze B2 Cloud Storage.

It should be noted that data lakes most frequently prove optimal for read-only or append-only datasets. Whereas databases often remain optimal for “hot” data with active insert, update and delete of individual rows, and especially updates of individual column values on individual rows.

We can only scratch the surface of storing, querying, and analyzing tabular data in a single blog post. So for this introductory article, we will:

Briefly explain the Drive Stats data.
Introduce open-source Trino as one option for executing SQL against the Drive Stats data.
Query Drive Stats data both in raw CSV format versus enhanced performance after transforming the data into the open-source Apache Parquet format.

The sections below take a step-by-step approach including details on the performance improvements realized when implementing recommended data engineering options. We start with a demonstration of analysis of raw data. Then progress through “data engineering” that transforms the data into formats that are optimal for accelerating repeated queries of the dataset. We conclude by highlighting our hosted, consolidated, complete Drive Stats dataset.

As mentioned earlier, this blog post is intended only as an introductory primer. In future blog posts, we will detail additional best practices and other common issues and opportunities with data analysis using Backblaze B2.

Backblaze Hard Drive Data and Stats (aka Drive Stats)

Drive Stats is an open-source data set of the daily metrics on the hard drives in Backblaze’s cloud storage infrastructure that Backblaze has open-sourced starting with April 2013. Currently, Drive Stats comprises nearly 300 million records, occupying over 90GB of disk space in raw comma-separated values (CSV) format, rising by over 200,000 records, or about 75MB of CSV data, per day. Drive Stats is an append-only dataset effectively logging daily statistics that once written are never updated or deleted.

The Drive Stats dataset is not quite “big data,” where datasets range from a few dozen terabytes to many zettabytes, but enough that physical data architecture starts to have a significant effect in both the amount of space that the data occupies and how the data can be accessed.

At the end of each quarter, Backblaze creates a CSV file for each day of data, ZIP those 90 or so files together, and make the compressed file available for download from a Backblaze B2 Bucket. While it’s easy to download and decompress a single file containing three months of data, this data architecture is not very flexible. With a little data engineering, though, it’s possible to make analytical data, such as the Drive Stats data set, available for modern data analysis tools to directly access from cloud storage, unlocking new opportunities for data analysis and data science.

Later, for comparison, we include a brief demonstration of performance of the data lake versus a traditional relational database. Architecturally, a difference between a data lake and a database is that databases integrate together both the query engine and the data storage. When data is either inserted or loaded into a database, the database has optimized internal storage structures it uses. Alternatively, with a data lake, the query engine and the data storage are separate. What we highlight below are basics for optimizing data storage in a data lake to enable the query engine to deliver the fastest query response times.

As with all data analysis, it is helpful to understand details of what the data represents. Before showing results, let’s take a deeper dive into the nature of the Drive Stats data. (For readers interested in first reviewing outcomes and improved query performance results, please skip ahead to the later sections “Compressed CSV” and “Enter Apache Parquet.”)

Navigating the Drive Stats Data

At Backblaze we collect a Drive Stats record from each hard drive, each day, containing the following data:

date: the date of collection.
serial_number: the unique serial number of the drive.
model: the manufacturer’s model number of the drive.
capacity_bytes: the drive’s capacity, in bytes.
failure: 1 if this was the last day that the drive was operational before failing, 0 if all is well.
A collection of SMART attributes. The number of attributes collected has risen over time; currently we store 87 SMART attributes in each record, each one in both raw and normalized form, with field names of the form smart_n_normalized and smart_n_raw, where n is between 1 and 255.

In total, each record currently comprises 179 fields of data describing the state of an individual hard drive on a given day (the number of SMART attributes collected has risen over time).

Comma-Separated Values, a Lingua Franca for Tabular Data

A CSV file is a delimited text file that, as its name implies, uses a comma to separate values. Typically, the first line of a CSV file is a header containing the field names for the data, separated by commas. The remaining lines in the file hold the data: one line per record, with each line containing the field values, again separated by commas.

Here’s a subset of the Drive Stats data represented as CSV. We’ve omitted most of the SMART attributes to make the records more manageable.

date,serial_number,model,capacity_bytes,failure,
smart_1_normalized,smart_1_raw
2022-01-01,ZLW18P9K,ST14000NM001G,14000519643136,0,73,20467240
2022-01-01,ZLW0EGC7,ST12000NM001G,12000138625024,0,84,228715872
2022-01-01,ZA1FLE1P,ST8000NM0055,8001563222016,0,82,157857120
2022-01-01,ZA16NQJR,ST8000NM0055,8001563222016,0,84,234265456
2022-01-01,1050A084F97G,TOSHIBA MG07ACA14TA,14000519643136,0,100,0

Currently, we create a CSV file for each day’s data, comprising a record for each drive that was operational at the beginning of that day. The CSV files are each named with the appropriate date in year-month-day order, for example, 2022-06-28.csv. As mentioned above, we make each quarter’s data available as a ZIP file containing the CSV files.

At the beginning of the last Drive Stats quarter, Jan 1, 2022, we were spinning over 200,000 hard drives, so each daily file contained over 200,000 lines and occupied nearly 75MB of disk space. The ZIP file containing the Drive Stats data for the first quarter of 2022 compressed 90 files totaling 6.63GB of CSV data to a single 1.06GB file made available for download here.

Big Data Analytics in the Cloud with Trino

Zipped CSV files allow users to easily download, inspect, and analyze the data locally, but a new generation of tools allows us to explore and query data in situ on Backblaze B2 and other cloud storage platforms. One example is the open-source Trino query engine (formerly known as Presto SQL). Trino can natively query data in Backblaze B2, Cassandra, MySQL, and many other data sources without copying that data into its own dedicated store.

A powerful capability of Trino is that it is a distributed query engine and offers what is sometimes referred to as massively parallel processing (MPP). Thus, adding more nodes in your Trino compute cluster consistently delivers dramatically shorter query execution times. Faster results are always desirable. We achieved the results we report below running Trino on only a single node.

Note: If you are unfamiliar with Trino, the open-source project was previously known as Presto and leverages the Hadoop ecosystem.

In preparing this blog post, our team used Brian Olsen’s excellent Hive connector over MinIO file storage tutorial as a starting point for integrating Trino with Backblaze B2. The tutorial environment includes a preconfigured Docker Compose environment comprising the Trino Docker image and other required services for working with data in Backblaze B2. We brought up the environment in Docker Desktop; alternately on ThinkPads and MacBook Pros.

As a first step, we downloaded the data set for the most recent quarter, unzipped it to our local disks, and then finally reuploaded the unzipped CSV into Backblaze B2 buckets. As mentioned above, the uncompressed CSV data occupies 6.63GB of storage, so we confined our initial explorations to just a single day’s data: over 200,000 records, occupying 72.8MB.

A Word About Apache Hive

Trino accesses analytical data in Backblaze B2 and other cloud storage platforms via its Hive connector. Quoting from the Trino documentation:

The Hive connector allows querying data stored in an Apache Hive data warehouse. Hive is a combination of three components:

Data files in varying formats, that are typically stored in the Hadoop Distributed File System (HDFS) or in object storage systems such as Amazon S3.
Metadata about how the data files are mapped to schemas and tables. This metadata is stored in a database, such as MySQL, and is accessed via the Hive metastore service.
A query language called HiveQL. This query language is executed on a distributed computing framework such as MapReduce or Tez.

Trino only uses the first two components: the data and the metadata. It does not use HiveQL or any part of Hive’s execution environment.

The Hive connector tutorial includes Docker images for the Hive metastore service (HMS) and MariaDB, so it’s a convenient way to explore this functionality with Backblaze B2.

Configuring Trino for Backblaze B2

The tutorial uses MinIO, an open-source implementation of the Amazon S3 API, so it was straightforward to adapt the sample MinIO configuration to Backblaze B2’s S3 Compatible API by just replacing the endpoint and credentials. Here’s the b2.properties file we created:

connector.name=hive
hive.metastore.uri=thrift://hive-metastore:9083
hive.s3.path-style-access=true
hive.s3.endpoint=https://s3.us-west-004.backblazeb2.com
hive.s3.aws-access-key=
hive.s3.aws-secret-key=
hive.non-managed-table-writes-enabled=true
hive.s3select-pushdown.enabled=false
hive.storage-format=CSV
hive.allow-drop-table=true

Similarly, we edited the Hive configuration files, again replacing the MinIO configuration with the corresponding Backblaze B2 values. Here’s a sample core-site.xml:

<?xml version="1.0"?>
<configuration>

    <property>
        <name>fs.defaultFS</name>
        <value>s3a://b2-trino-getting-started</value>
    </property>


    <!-- B2 properties -->
    <property>
        <name>fs.s3a.connection.ssl.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>fs.s3a.endpoint</name>
        <value>https://s3.us-west-004.backblazeb2.com</value>
    </property>

    <property>
        <name>fs.s3a.access.key</name>
        <value><my b2 application key id></value>
    </property>

    <property>
        <name>fs.s3a.secret.key</name>
        <value><my b2 application key id></value>
    </property>

    <property>
        <name>fs.s3a.path.style.access</name>
        <value>true</value>
    </property>

    <property>
        <name>fs.s3a.impl</name>
        <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>

</configuration>

We made a similar set of edits to metastore-site.xml and restarted the Docker instances so our changes took effect.

Uncompressed CSV

Our first test validated creating a table and running a query on a single-day CSV data set. Hive tables are configured with the directory containing the actual data files, so we uploaded 2020-01-01.csv from a local disk to data_20220101_csv/2020-01-01.csv in a Backblaze B2 bucket, opened the Trino CLI, and created a schema and a table:

CREATE SCHEMA b2.ds
WITH (location = 's3a://b2-trino-getting-started/');

USE b2.ds;

CREATE TABLE jan1_csv (
    date VARCHAR,
    serial_number VARCHAR,
    model VARCHAR,
    capacity_bytes VARCHAR,
    failure VARCHAR,
    smart_1_normalized VARCHAR,
    smart_1_raw VARCHAR,
    ...
    smart_255_normalized VARCHAR,
    smart_255_raw VARCHAR)
WITH (format = 'CSV',
    skip_header_line_count = 1,
    external_location = '
s3a://b2-trino-getting-started/data_20220101_csv');

Unfortunately, the Trino Hive connector only supports the VARCHAR data type when accessing CSV data, but, as we’ll see in a moment, we can use the CAST function in queries to convert character data to numeric and other types.

Now to run some queries! A good test is to check if all the data is there:

trino:ds> SELECT COUNT(*) FROM jan1_csv;
 _col0  
--------
 206954 
(1 row)

Query 20220629_162533_00024_qy4c6, FINISHED, 1 node
Splits: 8 total, 8 done (100.00%)
8.23 [207K rows, 69.4MB] [25.1K rows/s, 8.43MB/s]

Note: If you’re wondering about the discrepancy between the size of the CSV file–72.8MB–and the amount of data read by Trino–69.4MB–it’s accounted for in the different usage of the ‘MB’ abbreviation. For instance Mac interprets MB as a megabyte, 1,000,000 bytes, while Trino is reporting mebibytes, 1,048,576 bytes. Strictly speaking, Trino should use the abbreviation MiB. Pat opened an issue for this (with a goal of fixing it and submitting a pull request to the Trino project).

Now let’s see how many drives failed that day, grouped by the drive model:

trino:ds> SELECT model, COUNT(*) as failures 
       -> FROM jan1_csv 
       -> WHERE failure = 1 
       -> GROUP BY model 
       -> ORDER BY failures DESC;
       model        | failures 
--------------------+----------
 TOSHIBA MQ01ABF050 |        1 
 ST4000DM005        |        1 
 ST8000NM0055       |        1 
(3 rows)

Query 20220629_162609_00025_qy4c6, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
8.23 [207K rows, 69.4MB] [25.1K rows/s, 8.43MB/s]

Notice that the query execution time is identical between the two queries. This makes sense–the time taken to run the query is dominated by the time required to download the data from Backblaze B2.

Finally, we can use the CAST function with SUM and ROUND to see how many exabytes of storage we were spinning on that day:

trino:ds> SELECT ROUND(SUM(CAST(capacity_bytes AS bigint))/1e+18, 2) FROM jan1_csv;
 _col0 
-------
  2.25 
(1 row)

Query 20220629_172703_00047_qy4c6, FINISHED, 1 node
Splits: 12 total, 12 done (100.00%)
7.83 [207K rows, 69.4MB] [26.4K rows/s, 8.86MB/s]

Although this performance may seem too long running, please note that this is against raw data. What we are highlighting here with Drive Stats data can also be used for querying data in log files. As new records are written on this append-only dataset they immediately appear as new rows in the query. This is very powerful for both real-time and near real-time analysis, and faster performance is easily achieved by scaling out the Trino cluster. Remember, Trino is a distributed query engine. For this demonstration, we have limited Trino to running on just a single node.

Compressed CSV

This is pretty neat, but not exactly fast. Extrapolating, we might expect it to take about 12 minutes to run a query against a whole quarter of Drive Stats data.

Can we improve performance? Absolutely–we simply need to reduce the amount of data that needs to be downloaded for each query!

Commonplace in the world of data analytics are data pipelines, often known as ETL for Extract, Transform, and Load. Where data is repeatedly queried, it is often advantageous to “transform” data from the raw form that it originates in into some format more optimized for the repeated queries that follow through the next stages of that data’s life cycle.

For our next test we will perform an elementary transformation of the data using a lossless compression of the CSV data with Hive’s preferred gzip format, resulting in an 11.7 MB file, 2020-01-01.csv.gz. After uploading the compressed file to data_20220101_csv_gz/2020-01-01.csv.gz, we created a second table, copying the schema from the first:

CREATE TABLE jan1_csv_gz (
	LIKE jan1_csv
)
WITH (FORMAT = 'CSV',
    EXTERNAL_LOCATION = 's3a://b2-trino-getting-started/data_20220101_csv_gz');

Trying the failure count query:

trino:ds> SELECT model, COUNT(*) as failures 
       -> FROM jan1_csv_gz 
       -> WHERE failure = 1 
       -> GROUP BY model 
       -> ORDER BY failures DESC;
       model        | failures 
--------------------+----------
 TOSHIBA MQ01ABF050 |        1 
 ST8000NM0055       |        1 
 ST4000DM005        |        1 
(3 rows)

Query 20220629_162713_00027_qy4c6, FINISHED, 1 node
Splits: 15 total, 15 done (100.00%)
2.71 [207K rows, 11.1MB] [76.4K rows/s, 4.1MB/s]

As you might expect, given that Trino has to download less than ⅙ as much data as previously, the query time fell dramatically–from just over 8 seconds to under 3 seconds. Can we do even better than this?

Enter Apache Parquet

The issue with running this kind of analytical query is that it often results in a “full table scan”–Trino has to read the model and failure fields from every record to execute the query. The row-oriented layout of CSV data means that Trino ends up reading the entire file. We can get around this by using a file format designed specifically for analytical workloads.

While CSV files comprise a line of text for each record, Parquet is a column-oriented, binary file format, storing the binary values for each column contiguously. Here’s a simple visualization of the difference between row and column orientation:

Table representation:

Row orientation:

Column Orientation:

Parquet also implements run-length encoding and other compression techniques. Where a series of records have the same value for a given field the Parquet file need only store the value and the number of repetitions:

The result is a compact file format well suited for analytical queries.

There are many tools to manipulate tabular data from one format to another. In this case, we wrote a very simple Python script that used the pyarrow library to do the job:

import pyarrow.csv as csv
import pyarrow.parquet as parquet

filename = '2022-01-01.csv'

parquet.write_table(csv.read_csv(filename), 
filename.replace('.csv', '.parquet'))

The resulting Parquet file occupies 12.8MB–only 1.1MB more than the gzip file. Again, we uploaded the resulting file and created a table in Trino.

CREATE TABLE jan1_parquet (
    date DATE,
    serial_number VARCHAR,
    model VARCHAR,
    capacity_bytes BIGINT,
    failure TINYINT,
    smart_1_normalized BIGINT,
    smart_1_raw BIGINT,
    ...
    smart_255_normalized BIGINT,
    smart_255_raw BIGINT)
WITH (FORMAT = 'PARQUET',
    EXTERNAL_LOCATION = 
's3a://b2-trino-getting-started/data_20220101_parquet);

Note that the conversion to Parquet automatically formatted the data using appropriate types, which we used in the table definition.

Let’s run a query and see how Parquet fares against compressed CSV:

trino:ds> SELECT model, COUNT(*) as failures 
       -> FROM jan1_parquet 
       -> WHERE failure = 1 
       -> GROUP BY model 
       -> ORDER BY failures DESC;
       model        | failures 
--------------------+----------
 TOSHIBA MQ01ABF050 |        1 
 ST4000DM005        |        1 
 ST8000NM0055       |        1 
(3 rows)

Query 20220629_163018_00031_qy4c6, FINISHED, 1 node
Splits: 15 total, 15 done (100.00%)
0.78 [207K rows, 334KB] [265K rows/s, 427KB/s]

The test query is executed in well under a second! Looking at the last line of output, we can see that the same number of rows were read, but only 334KB of data was retrieved. Trino was able to retrieve just the two columns it needed, out of the 179 columns in the file, to run the query.

Similar analytical queries execute just as efficiently. Calculating the total amount of storage in exabytes:

trino:ds> SELECT ROUND(SUM(capacity_bytes)/1e+18, 2) FROM jan1_parquet;
 _col0 
-------
  2.25 
(1 row)

Query 20220629_163058_00033_qy4c6, FINISHED, 1 node
Splits: 10 total, 10 done (100.00%)
0.83 [207K rows, 156KB] [251K rows/s, 189KB/s]

What was the capacity of the largest drive in terabytes?

trino:ds> SELECT max(capacity_bytes)/1e+12 FROM jan1_parquet;
      _col0      
-----------------
 18.000207937536 
(1 row)

Query 20220629_163139_00034_qy4c6, FINISHED, 1 node
Splits: 10 total, 10 done (100.00%)
0.80 [207K rows, 156KB] [259K rows/s, 195KB/s]

Parquet’s columnar layout excels with analytical workloads, but if we try a query more suited to an operational database, Trino has to read the entire file, as we would expect:

trino:ds> SELECT * FROM jan1_parquet WHERE serial_number = 'ZLW18P9K';
    date    | serial_number |     model     | capacity_bytes | failure
------------+---------------+---------------+----------------+--------
 2022-01-01 | ZLW18P9K      | ST14000NM001G | 14000519643136 |       0
(1 row)

Query 20220629_163206_00035_qy4c6, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
2.05 [207K rows, 12.2MB] [101K rows/s, 5.95MB/s]

Scaling Up

After validating our Trino configuration with just a single day’s data, our next step up was to create a Parquet file containing an entire quarter. The file weighed in at 1.0GB, a little smaller than the zipped CSV.

Here’s the failed drives query for the entire quarter, limited to the top 10 results:

trino:ds> SELECT model, COUNT(*) as failures 
       -> FROM q1_2022_parquet 
       -> WHERE failure = 1 
       -> GROUP BY model 
       -> ORDER BY failures DESC 
       -> LIMIT 10;
        model         | failures 
----------------------+----------
 ST4000DM000          |      117 
 TOSHIBA MG07ACA14TA  |       88 
 ST8000NM0055         |       86 
 ST12000NM0008        |       73 
 ST8000DM002          |       38 
 ST16000NM001G        |       24 
 ST14000NM001G        |       24 
 HGST HMS5C4040ALE640 |       21 
 HGST HUH721212ALE604 |       21 
 ST12000NM001G        |       20 
(10 rows)

Query 20220629_183338_00050_qy4c6, FINISHED, 1 node
Splits: 43 total, 43 done (100.00%)
3.38 [18.8M rows, 15.8MB] [5.58M rows/s, 4.68MB/s]

Of course, those are absolute failure numbers; they don’t take account of how many of each drive model are in use. We can construct a more complex query that tells us the percentages of failed drives, by model:

trino:ds> SELECT drives.model AS model, drives.drives AS drives, 
       ->   failures.failures AS failures, 
       ->   ROUND((CAST(failures AS double)/drives)*100, 6) AS percentage
       -> FROM
       -> (
       ->   SELECT model, COUNT(*) as drives 
       ->   FROM q1_2022_parquet 
       ->   GROUP BY model
       -> ) AS drives
       -> RIGHT JOIN
       -> (
       ->   SELECT model, COUNT(*) as failures 
       ->   FROM q1_2022_parquet 
       ->   WHERE failure = 1 
       ->   GROUP BY model
       -> ) AS failures
       -> ON drives.model = failures.model
       -> ORDER BY percentage DESC
       -> LIMIT 10;
        model         | drives | failures | percentage 
----------------------+--------+----------+------------
 ST12000NM0117        |    873 |        1 |   0.114548 
 ST10000NM001G        |   1028 |        1 |   0.097276 
 HGST HUH728080ALE604 |   4504 |        3 |   0.066607 
 TOSHIBA MQ01ABF050M  |  26231 |       13 |    0.04956 
 TOSHIBA MQ01ABF050   |  24765 |       12 |   0.048455 
 ST4000DM005          |   3331 |        1 |   0.030021 
 WDC WDS250G2B0A      |   3338 |        1 |   0.029958 
 ST500LM012 HN        |  37447 |       11 |   0.029375 
 ST12000NM0007        | 118349 |       19 |   0.016054 
 ST14000NM0138        | 144333 |       17 |   0.011778 
(10 rows)

Query 20220629_191755_00010_tfuuz, FINISHED, 1 node
Splits: 82 total, 82 done (100.00%)
8.70 [37.7M rows, 31.6MB] [4.33M rows/s, 3.63MB/s]

This query took twice as long as the last one! Again, data transfer time is the limiting factor–Trino downloads the data for each subquery. A real-world deployment would take advantage of the Hive Connector’s storage caching feature to avoid repeatedly retrieving the same data.

Picking the Right Tool for the Job

You might be wondering how a relational database would stack up against the Trino/Parquet/Backblaze B2 combination. As a quick test, we installed PostgreSQL 14 on a MacBook Pro, loaded the same quarter’s data into a table, and ran the same set of queries:

Count Rows

sql_stmt=# \timing
Timing is on.
sql_stmt=# SELECT COUNT(*) FROM q1_2022;

  count   
----------
 18845260
(1 row)

Time: 1579.532 ms (00:01.580)

Absolute Number of Failures

sql_stmt=# SELECT model, COUNT(*) as failures                                                                                                          FROM q1_2022                                                                                                                                             WHERE failure = 't'                                                                                                                                      GROUP BY model                                                                                                                                           ORDER BY failures DESC                                                                                                                                   LIMIT 10;

        model         | failures 
----------------------+----------
 ST4000DM000          |      117
 TOSHIBA MG07ACA14TA  |       88
 ST8000NM0055         |       86
 ST12000NM0008        |       73
 ST8000DM002          |       38
 ST14000NM001G        |       24
 ST16000NM001G        |       24
 HGST HMS5C4040ALE640 |       21
 HGST HUH721212ALE604 |       21
 ST12000NM001G        |       20
(10 rows)

Time: 2052.019 ms (00:02.052)

Relative Number of Failures

sql_stmt=# SELECT drives.model AS model, drives.drives AS drives,                                                                                      failures.failures,                                                                                                                                       ROUND((CAST(failures AS numeric)/drives)*100, 6) AS percentage                                                                                           FROM                                                                                                                                                     (                                                                                                                                                        SELECT model, COUNT(*) as drives                                                                                                                         FROM q1_2022                                                                                                                                             GROUP BY model                                                                                                                                           ) AS drives                                                                                                                                              RIGHT JOIN                                                                                                                                               (                                                                                                                                                        SELECT model, COUNT(*) as failures                                                                                                                       FROM q1_2022                                                                                                                                             WHERE failure = 't'                                                                                                                                      GROUP BY model                                                                                                                                           ) AS failures                                                                                                                                            ON drives.model = failures.model                                                                                                                         ORDER BY percentage DESC                                                                                                                                 LIMIT 10;
        model         | drives | failures | percentage 
----------------------+--------+----------+------------
 ST12000NM0117        |    873 |        1 |   0.114548
 ST10000NM001G        |   1028 |        1 |   0.097276
 HGST HUH728080ALE604 |   4504 |        3 |   0.066607
 TOSHIBA MQ01ABF050M  |  26231 |       13 |   0.049560
 TOSHIBA MQ01ABF050   |  24765 |       12 |   0.048455
 ST4000DM005          |   3331 |        1 |   0.030021
 WDC WDS250G2B0A      |   3338 |        1 |   0.029958
 ST500LM012 HN        |  37447 |       11 |   0.029375
 ST12000NM0007        | 118349 |       19 |   0.016054
 ST14000NM0138        | 144333 |       17 |   0.011778
(10 rows)

Time: 3831.924 ms (00:03.832)

Retrieve a Single Record by Serial Number and Date

Modifying the query, since we have an entire quarter’s data:

sql_stmt=# SELECT * FROM q1_2022 WHERE serial_number = 'ZLW18P9K' AND date = '2022-01-01';
    date    | serial_number |     model     | capacity_bytes | failure
------------+---------------+---------------+----------------+-------- 
 2022-01-01 | ZLW18P9K      | ST14000NM001G | 14000519643136 | f       (1 row)

Time: 1690.091 ms (00:01.690)

For comparison, we tried to run the same query against the quarter’s data in Parquet format, but Trino crashed with an out of memory error after 58 seconds. Clearly some tuning of the default configuration is required!

Bringing the numbers together for the quarterly data sets. All times are in seconds.

PostgreSQL is faster for most operations, but not by much, especially considering that its data is on the local SSD, rather than Backblaze B2!

It’s worth mentioning that there are yet more tuning optimizations that we have not demonstrated in this exercise. For instance, the Trino Hive connector supports storage caching. Implementing a cache yields further performance gains by avoiding repeatedly retrieving the same data from Backblaze B2. Further, Trino is a distributed query engine. Trino’s architecture is horizontally scalable. This means that Trino can also deliver shorter query run times by adding more nodes in your Trino compute cluster. We have limited all timings in this demonstration to Trino running on just a single node.

Partitioning Your Data Lake

Our final exercise was to create a single Drive Stats dataset containing all nine years of Drive Stats data. As stated above, at the time of writing the full Drive Stats dataset comprises nearly 300 million records, occupying over 90GB of disk space when in raw CSV format, rising by over 200,000 records per day, or about 75MB of CSV data.

As the dataset grows in size, an additional data engineering best practice is to include partitions.

In the introduction we mentioned that databases use optimized internal storage structures. Foremost among these are indexes. Data lakes have limited support for indexes. Data lakes do, however, support partitions. Data lake partitions are functionally similar to what databases alternately refer to as either a primary key index or index-organized tables. Regardless of the name, they effectively achieve faster data retrieval by having the data itself physically sorted. Since Drive Stats is append-only, when sorting on a date field, new records are appended to the dataset.

Having the data physically sorted greatly aids retrieval in cases that are known as range queries. To achieve fastest retrieval on a given query, it is important to only retrieve data that resolves true on the predicate in the WHERE clause. In the case of Drive Stats, for a query on only a single month or several consecutive months we get the fastest time to the result if we can read only the data for these months. Without partitioning Trino would need to do a full table scan, resulting in slower response due to the overhead of reading records for which the WHERE clause logic resolves to false. Organizing the Drive Stats data into partitions enables Trino to efficiently skip records that resolve the WHERE clause to false. Thus with partitions, many queries are far more efficient and incur the read cost only of those records whose WHERE clause logic resolves to true.

Our final transformation required a tweak to the Python script to iterate over all of the Drive Stats CSV files, writing Parquet files partitioned by year and month, so the files have prefixes of the form.

/drivestats/year={year}/month={month}/

For example:

/drivestats/year=2021/month=12/

The number of SMART attributes reported can change from one day to the next, and a single Parquet file can have only one schema, so there are one or more files with each prefix, named

{year}-{month}-{index}.parquet

For example:

2021-12-1.parquet

Again, we uploaded the resulting files and created a table in Trino.

CREATE TABLE drivestats (
    serial_number VARCHAR,
    model VARCHAR,
    capacity_bytes BIGINT,
    failure TINYINT,
    smart_1_normalized BIGINT,
    smart_1_raw BIGINT,
    ...
    smart_255_normalized BIGINT,
    smart_255_raw BIGINT,
    day SMALLINT,
    year SMALLINT,
    month SMALLINT
)
WITH (format = 'PARQUET',
 PARTITIONED_BY = ARRAY['year', 'month'],
      EXTERNAL_LOCATION = 's3a://b2-trino-getting-started/drivestats-parquet');

Note that the conversion to Parquet automatically formatted the data using appropriate types, which we used in the table definition.

This command tells Trino to scan for partition files.

CALL system.sync_partition_metadata('ds', 'drivestats', 'FULL');

Let’s run a query and see the performance against the full Drive Stats dataset in Parquet format, partitioned by month:

trino:ds> SELECT COUNT(*) FROM drivestats;
   _col0   
-----------
296413574 
(1 row)

Query 20220707_182743_00055_tshdf, FINISHED, 1 node
Splits: 412 total, 412 done (100.00%)
15.84 [296M rows, 5.63MB] [18.7M rows/s, 364KB/s]

It takes 16 seconds to count the total number of records, reading only 5.6MB of the 15.3GB total data.

Next, let’s run a query against just one month’s data:

trino:ds> SELECT COUNT(*) FROM drivestats WHERE year = 2022 AND month = 1;
  _col0  
---------
 6415842 
(1 row)

Query 20220707_184801_00059_tshdf, FINISHED, 1 node
Splits: 16 total, 16 done (100.00%)
0.85 [6.42M rows, 56KB] [7.54M rows/s, 65.7KB/s]

Counting the records for a given month takes less than a second, retrieving just 56KB of data–partitioning is working!

Now we have the entire Drive Stats data set loaded into Backblaze B2 in an efficient format and layout for running queries. Our next blog post will look at some of the queries we’ve run to clean up the data set and gain insight into nine years of hard drive metrics.

Conclusion

We hope that this article inspires you to try using Backblaze for your data analytics workloads if you’re not already doing so, and that it also serves as a useful primer to help you set up your own data lake using Backblaze B2 Cloud Storage. Our Drive Stats data is just one example of the type of data set that can be used for data analytics on Backblaze B2.

Hopefully, you too will find that Backblaze B2 Cloud Storage can be a useful, powerful, and very cost effective option for your data lake workloads.

If you’d like to get started working with analytical data in Backblaze B2, sign up here for 10 GB storage, free of charge, and get to work. If you’re already storing and querying analytical data in Backblaze B2, please let us know in the comments what tools you’re using and how it’s working out for you!

If you already work with Trino (or other data lake analytic engines), and would like connection credentials for our partitioned, Parquet, complete Drive Stats data set that is now hosted on Backblaze B2 Cloud Storage, please contact us at [email protected].
Future blog posts focused on Drive Stats and analytics will be using this complete Drive Stats dataset.

Similarly, please let us know if you would like to run a proof of concept hosting your own data in a Backblaze B2 data lake and would like the assistance of the Backblaze Developer Evangelism team.

And lastly, if you think this article may be of interest to your colleagues, we’d very much appreciate you sharing it with them.

The post Storing and Querying Analytical Data in Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Drive Stats for Q2 2022

2022-08-02

Post Syndicated from original https://www.backblaze.com/blog/backblaze-drive-stats-for-q2-2022/

As of the end of Q2 2022, Backblaze was monitoring 219,444 hard drives and SSDs in our data centers around the world. Of that number, 4,020 are boot drives, with 2,558 being SSDs, and 1,462 being HDDs. Later this quarter, we’ll review our SSD collection. Today, we’ll focus on the 215,424 data drives under management as we review their quarterly and lifetime failure rates as of the end of Q2 2022. Along the way, we’ll share our observations and insights on the data presented and, as always, we look forward to you doing the same in the comments section at the end of the post.

Lifetime Hard Drive Failure Rates

This report, we’ll change things up a bit and start with the lifetime failure rates. We’ll cover the Q2 data later on in this post. As of June 30, 2022, Backblaze was monitoring 215,424 hard drives used to store data. For our evaluation, we removed 413 drives from consideration as they were used for testing purposes or drive models which did not have at least 60 drives. This leaves us with 215,011 hard drives grouped into 27 different models to analyze for the lifetime report.

Notes and Observations About the Lifetime Stats

The lifetime annualized failure rate for all the drives listed above is 1.39%. That is the same as last quarter and down from 1.45% one year ago (6/30/2021).

A quick glance down the annualized failure rate (AFR) column identifies the three drives with the highest failure rates:

The 8TB HGST (model: HUH728080ALE604) at 6.26%.
The Seagate 14TB (model: ST14000NM0138) at 4.86%.
The Toshiba 16TB (model: MG08ACA16TA at 3.57%.

What’s common between these three models? The sample size, in our case drive days, is too small, and in these three cases leads to a wide range between the low and high confidence interval values. The wider the gap, the less confident we are about the AFR in the first place.

In the table above, we list all of the models for completeness, but it does make the chart more complex. We like to make things easy, so let’s remove those drive models that have wide confidence intervals and only include drive models that are generally available. We’ll set our parameters as follows: a 95% confidence interval gap of 0.5% or less, a minimum drive days value of one million to ensure we have a large enough sample size, and drive models that are 8TB or more in size. The simplified chart is below.

To summarize, in our environment, we are 95% confident that the AFR listed for each drive model is between the low and high confidence interval values.

Computing the Annualized Failure Rate

We use the term annualized failure rate, or AFR, throughout our Drive Stats reports. Let’s spend a minute to explain how we calculate the AFR value and why we do it the way we do. The formula for a given cohort of drives is:

AFR = ( drive_failures / ( drive_days / 365 )) * 100

Let’s define the terms used:

Cohort of drives: The selected set of drives (typically by model) for a given period of time (quarter, annual, lifetime).
AFR: Annualized failure rate, which is applied to the selected cohort of drives.
drive_failures: The number of failed drives for the selected cohort of drives.
drive_days: The number of days all of the drives in the selected cohort are operational during the defined period of time of the cohort (i.e., quarter, annual, lifetime).

For example, for the 16TB Seagate drive in the table above, we have calculated there were 117 drive failures and 4,117,553 drive days over the lifetime of this particular cohort of drives. The AFR is calculated as follows:

AFR = ( 117 / ( 4,117,553 / 365 )) * 100 = 1.04%

Why Don’t We Use Drive Count?

Our environment is very dynamic when it comes to drives entering and leaving the system; a 12TB HGST drive fails and is replaced by a 12TB Seagate, a new Backblaze Vault is added and 1,200 new 14TB Toshiba drives are added, a Backblaze Vault of 4TB drives is retired, and so on. Using drive count is problematic because it assumes a stable number of drives in the cohort over the observation period. Yes, we will concede that with enough math you can make this work, but rather than going back to college, we keep it simple and use drive days as it accounts for the potential change in the number of drives during the observation period and apportions each drive’s contribution accordingly.

For completeness, let’s calculate the AFR for the 16TB Seagate drive using a drive count-based formula given there were 16,860 drives and 117 failures.

Drive Count AFR = ( 117 / 16,860 ) * 100 = 0.69%

While the drive count AFR is much lower, the assumption that all 16,860 drives were present the entire observation period (lifetime) is wrong. Over the last quarter, we added 3,601 new drives, and over the last year, we added 12,003 new drives. Yet, all of these were counted as if they were installed on day one. In other words, using drive count AFR in our case would misrepresent drive failure rates in our environment.

How We Determine Drive Failure

Today, we classify drive failure into two categories: reactive and proactive. Reactive failures are where the drive has failed and won’t or can’t communicate with our system. Proactive failures are where failure is imminent based on errors the drive is reporting which are confirmed by examining the SMART stats of the drive. In this case, the drive is removed before it completely fails.

Over the last few years, data scientists have used the SMART stats data we’ve collected to see if they can predict drive failure using various statistical methodologies, and more recently, artificial intelligence and machine learning techniques. The ability to accurately predict drive failure, with minimal false positives, will optimize our operational capabilities as we scale our storage platform.

SMART Stats

SMART stands for Self-monitoring, Analysis, and Reporting Technology and is a monitoring system included in hard drives that reports on various attributes of the state of a given drive. Each day, Backblaze records and stores the SMART stats that are reported by the hard drives we have in our data centers. Check out this post to learn more about SMART stats and how we use them.

Q2 2022 Hard Drive Failure Rates

For the Q2 2022 quarterly report, we tracked 215,011 hard drives broken down by drive model into 27 different cohorts using only data from Q2. The table below lists the data for each of these drive models.

Notes and Observations on the Q2 2022 Stats

Breaking news, the OG stumbles: The 6TB Seagate drives (model: ST6000DX000) finally had a failure this quarter—actually, two failures. Given this is the oldest drive model in our fleet with an average age of 86.7 months of service, a failure or two is expected. Still, this was the first failure by this drive model since Q3 of last year. At some point in the future we can expect these drives will be cycled out, but with their lifetime AFR at just 0.87%, they are not first in line.

Another zero for the next OG: The next oldest drive cohort in our collection, the 4TB Toshiba drives (model: MD04ABA400V) at 85.3 months, had zero failures for Q2. The last failure was recorded a year ago in Q2 2021. Their lifetime AFR is just 0.79%, although their lifetime confidence interval gap is 1.3%, which as we’ve seen means we are lacking enough data to be truly confident of the AFR number. Still, at one failure per year, they could last another 97 years—probably not.

More zeroes for Q2: Three other drives had zero failures this quarter: the 8TB HGST (model: HUH728080ALE604), the 14TB Toshiba (model: MG07ACA14TEY), and the 16TB Toshiba (model: MG08ACA16TA). As with the 4TB Toshiba noted above, these drives have very wide confidence interval gaps driven by a limited number of data points. For example, the 16TB Toshiba had the most drive days—32,064—of any of these drive models. We would need to have at least 500,000 drive days in a quarter to get to a 95% confidence interval. Still, it is entirely possible that any or all of these drives will continue to post great numbers over the coming quarters, we’re just not 95% confident yet.

Running on fumes: The 4TB Seagate drives (model: ST4000DM000) are starting to show their age, 80.3 months on average. Their quarterly failure rate has increased each of the last four quarters to 3.42% this quarter. We have deployed our drive cloning program for these drives as part of our data durability program, and over the next several months, these drives will be cycled out. They have served us well, but it appears they are tired after nearly seven years of constant spinning.

The AFR increases, again: In Q2, the AFR increased to 1.46% for all drives models combined. This is up from 1.22% in Q1 2022 and up from 1.01% a year ago in Q2 2021. The aging 4TB Seagate drives are part of the increase, but the failure rates of both the Toshiba and HGST drives have increased as well over the last year. This appears to be related to the aging of the entire drive fleet and we would expect this number to go down as older drives are retired over the next year.

Four Thousand Storage Servers

In the opening paragraph, we noted there were 4,020 boot drives. What may not be obvious is that this equates to 4,020 storage servers. These are 4U servers with 45 or 60 drives in each with drives ranging in size from 4TB to 16TB. The smallest is 180TB of raw storage space (45 * 4TB drives) and the largest is 960TB of raw storage (60 * 16TB drives). These servers are a mix of Backblaze Storage Pods and third-party storage servers. It’s been a while since our last Storage Pod update, so look for something in late Q3 or early Q4.

Drive Stats at DEFCON

If you will be at DEFCON 30 in Las Vegas, I will be speaking live at the Data Duplication Village (DDV) at 1 p.m. on Friday, August 12th. The all-volunteer DDV is located in the lower level of the executive conference center of the Flamingo hotel. We’ll be talking about Drive Stats, SSDs, drive life expectancy, SMART stats, and more. I hope to see you there.

Never Miss the Drive Stats Report

Sign up for the Drive Stats Insiders newsletter and be the first to get Drive Stats data every quarter as well as the new Drive Stats SSD edition.

The Hard Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.

If you want the tables and charts used in this report, you can download the .zip file from Backblaze B2 Cloud Storage which contains the .jpg and/or .xlsx files as applicable.
Good luck and let us know if you find anything interesting.

Want More Drive Stats Insights?

Check out our 2021 Year-end Drive Stats Report.

Interested in the SSD Data?

Read our first SSD-based Drive Stats Report.

The post Backblaze Drive Stats for Q2 2022 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Get a Clear Picture of Your Data Spread With BackBlaze and DataIntell

2022-07-28 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/get-a-clear-picture-of-your-data-spread-with-backblaze-and-dataintell/

Do you know where your data is? It’s a question more and more businesses have to ask themselves, and if you don’t have a definitive answer, you’re not alone. The average company manages over 100TB of data. By 2025, it’s estimated that 463 exabytes of data will be created each day globally. That’s a massive amount of data to keep tabs on.

But understanding where your data lives is just one part of the equation. Your next question is probably, “How much is it costing me?” A new partnership between Backblaze and DataIntell can help you get answers to both questions.

What Is DataIntell?

DataIntell is an application designed to help you better understand your data and
storage utilization. This analytic tool helps identify old and unused files and gives better
insights into data changes, file duplication, and used space over time. It is designed to
help you manage large amounts of data growth. It provides detailed, user friendly, and accurate analytics of your data use, storage, and cost, allowing you to optimize your storage and monitor its usage no matter where it lives—on-premises or in the cloud.

How Does Backblaze Integrate With DataIntell?

Together, DataIntell and Backblaze provide you with the best of both worlds. DataIntell allows you to identify and understand the costs and security of your data today, while Backblaze provides you with a simple, scalable, and reliable cloud storage option for the future.

“DataIntell offers a unique storage analysis and data management software which facilitates decision making while reducing costs and increasing efficiency, either for on-prem, cloud, or archives. With Backblaze and DataIntell, organizations can now manage their data growth and optimize their storage cost with these two simple and easy-to-use solutions.
—Olivier Rivard, President/CTO, DataIntell

How Does This Partnership Benefit Joint Customers?

This partnership delivers value to joint customers in three key areas:

It allows you to make the most of your data wherever it lives, at speed, and with a 99.9% uptime SLA—no cold delays or speed premiums.
You can easily migrate on-premises data and data stored on tape to scalable, affordable cloud storage.
You can stretch your budget (further) with S3-compatible storage predictably priced at a fraction of the cost of other cloud providers.

“Unlike legacy providers, Backblaze offers always-hot storage in one tier, so there’s no juggling between tiers to stay within budget. By partnering with DataIntell, we can offer a cost-effective solution to joint customers looking to simplify their storage spend and data management efforts.”
—Nilay Patel, Vice President of Sales and Partnerships, Backblaze

Getting Started With Backblaze B2 and DataIntell

Are you looking for more insight into your data landscape? Contact our Sales team today to get started.

The post Get a Clear Picture of Your Data Spread With BackBlaze and DataIntell appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Streaming Media with Backblaze B2: A Data Storage Guide

2022-07-27 Pat Patterson

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/roll-camera-streaming-media-from-backblaze-b2/

You can store petabytes of audio and video assets in Backblaze B2 Cloud Storage, and lots of our customers do. While many customers archive their digital assets for long-term safekeeping, a growing number of customers use Backblaze B2 to deliver media assets to their end consumers, often embedded in web pages.

Embedding audio and video files in web pages for playback in the browser is nothing new, but there are a lot of ingredients in the mix, and it can be tricky to get right. Streaming media from Backblaze B2 simplifies sharing stored data. Whether you’re delivering content for web applications or supporting business workflows, Backblaze B2 offers a seamless way to store, protect, and share data.

After reading this blog post, you’ll be ready to deliver media assets from Backblaze B2 to website users reliably and affordably. I’ll cover:

A little bit of history on how streaming media came to be.
A primer on the various strands of technology and how they work.
A how-to guide for streaming media from your Backblaze B2 account.

The evolution of internet media streaming

Back in the early days of the web, audio and video content was a rarity. Most people connected to the internet via a dial-up link, and just didn’t have the bandwidth to stream audio, let alone video, content to their computer. Consequently, the early web standards specified how browsers should show images in web pages via the <img> tag, but made no mention of audio/visual resources. Digital storage media like floppy disks and magnetic disks were often used for storing and sharing multimedia content offline.

As bandwidth increased to the point where it was possible for more of us to stream large media files, Adobe’s Flash Player became the de facto standard for playing audio and video in the web browser. Flash allowed websites to embed and stream media files directly, revolutionizing the way content was consumed online. When YouTube launched, for example, in early 2005, it required the Flash Player plug-in to be installed in the browser to view the videos.

However, reliance on Flash presented limitations, including security vulnerabilities and compatibility issues across devices. As a result, web developers began exploring native solutions for streaming media. This resulted in a significant milestone in digital data delivery, as browsers and devices started incorporating built-in capabilities for playing media.

Meanwhile, the evolution of data storage media, including external hard drives, solid-state drives (SSDs), and cloud storage solutions, allowed users to store and access large media libraries more efficiently.

The HTML5 video element: Transforming online media playback

At around the same time, a consortium of the major browser vendors started work on a new version of HTML, the markup language that had been a part of the web since its inception. A major goal of HTML5 was to support multimedia content natively, and so, in its initial release in 2008, the specification introduced new <audio> and <video> tags to embed audiovisual content directly in web pages, without requiring additional plugins like Flash.

This development was a game-changer in digital storage media and content delivery, as it eliminated the need for third-party software and made streaming video and audio more accessible across devices. The shift also facilitated better cross-platform compatibility, a critical feature for supporting consumer devices such as smartphones, tablets, and smart TVs.

While web pages are written in HTML, they are delivered from the web server to the browser via the HTTP protocol. Web servers don’t just deliver web pages, of course—images, scripts, audio, and video files are also delivered via HTTP.

HTML5’s multimedia capabilities also introduced support for various codecs, enabling better compression and playback quality for formats like MP4. This innovation marked a significant step forward in how websites handled and delivered rich media experiences.

How streaming technology works

Understanding the key components of streaming technology will help you set up seamless digital data delivery on your site. Here, we’ll cover:

Streaming vs. progressive download.
HTTP 1.1 byte range serving.
Media file formats.
MIME types.

Streaming vs. progressive download

In common usage, the term, “streaming,” in the context of web media, can refer to any situation where the user can request content (for example, press a play button) and consume that content almost immediately, as opposed to downloading a media file, where the user has to wait to receive the entire file before they can watch or listen. This is particularly useful for media files stored on data storage media such as solid-state drives or cloud-based storage media.

Technically, however, the term, “streaming,” refers to a continuous delivery method, and uses transport protocols such as RTSP rather than HTTP. This form of streaming requires specialized software to handle data traffic in real time

Progressive download blends aspects of downloading and streaming. When the user presses play on a video on a web page, the browser starts to download the video file. However, the browser may begin playback before the download is complete. So, the user experience of progressive download is much the same as streaming, and I’ll use the term, “streaming” in its colloquial sense in this blog post. Progressive downloads rely on efficient data storage technology and protocols like HTTP to deliver content to consumer devices, such as laptops or smartphones.

Both streaming and progressive download are widely used in cloud environments today, and enable websites to serve media content directly from servers—whether local or in the cloud.

HTTP 1.1 byte range serving

HTTP enables progressive download through a feature known as range serving. Introduced to HTTP in version 1.1 back in 1997, byte range serving allows an HTTP client, such as your browser, to request a specific range of bytes from a resource, such as a video file, rather than the entire resource all at once.

Imagine you’re watching a video online and realize you’ve already seen the first half. You can click the video’s slider control, picking up the action at the appropriate point. Without byte range serving, your browser would be downloading the whole video, and you might have to wait several minutes for it to reach the halfway point and start playing. With byte range serving, the browser can specify a range of bytes in each request, so it’s easy for the browser to request data from the middle of the video file, skipping any amount of content almost instantly.

Byte range serving significantly enhances user experience and optimizes network bandwidth. It’s especially beneficial when serving large media files stored in cloud storage.

Backblaze B2 supports byte range serving in downloads via both the Backblaze B2 Native and S3 Compatible APIs. (Check out this post for an explainer of the differences between the two.)

Here’s an example range request for the first 10 bytes of a file in a Backblaze B2 bucket, using the cURL command line tool.

You can see the Range header in the request, specifying bytes zero to nine, and the Content-Range header indicating that the response indeed contains bytes zero to nine of a total of 555,214,865 bytes. Note also the HTTP status code: 206, signifying a successful retrieval of partial content, rather than the usual 200.

% curl -I https://metadaddy-public.s3.us-west-004.backblazeb2.com/
example.mp4 -H 'Range: bytes=0-9'

HTTP/1.1 206 
Accept-Ranges: bytes
Last-Modified: Tue, 12 Jul 2022 20:06:09 GMT
ETag: "4e104e1bd9a2111002a74c9c798515e6-106"
Content-Range: bytes 0-9/555214865
x-amz-request-id: 1e90f359de28f27a
x-amz-id-2: aMYY1L2apOcUzTzUNY0ZmyjRRZBhjrWJz
x-amz-version-id: 4_zf1f51fb913357c4f74ed0c1b_f202e87c8ea50bf77_
d20220712_m200609_c004_v0402006_t0054_u01657656369727
Content-Type: video/mp4
Content-Length: 10
Date: Tue, 12 Jul 2022 20:08:21 GMT

I recommend that you use S3-style URLs for media content, as shown in the above example, rather than Backblaze B2-style URLs of the form: https://f004.backblazeb2.com/file/metadaddy-public/example.mp4.

The B2 Native API responds to a range request that specifies the entire content, e.g., Range: 0-, with HTTP status 200, rather than 206. Safari interprets that response as indicating that Backblaze B2 does not support range requests, and thus will not start playing content until the entire file is downloaded.

The S3 Compatible API returns HTTP status 206 for all range requests, regardless of whether they specify the entire content, so Safari will allow you to play the video as soon as the page loads.

Media file formats for storage and streaming

The third ingredient in streaming media successfully is the file format. There are several container formats for audio and video data, with familiar file name extensions such as .mov, .mp4, and .avi. Within these containers, media data can be encoded in many different ways by software components known as codecs, an abbreviation of coder/decoder.

Codecs play a critical role in digital storage media. They compress and decompress the data to optimize storage capacity while preserving playback quality. Choosing the right codec ensures efficient long-term storage and seamless delivery of high-quality media.

We could write a whole series of blog articles on containers and codecs, but the important point is that the media’s metadata—information regarding how to play the media, such as its length, bit rate, dimensions, and frames per second—must be located at the beginning of the video file, so that this information is immediately available as download starts. This optimization is known as “Fast Start” and is supported by software such as ffmpeg and Premiere Pro. Without this, there might be playback delays, negatively impacting the user experience.

Understanding MIME types for reliable media playback

The final piece of the puzzle is the media file’s MIME type, which identifies the file format for the browser or media player. You can see a MIME type in the Content-Type header in the above example request: video/mp4. You must specify the MIME type when you upload a file to Backblaze B2. You can set it explicitly, or use the special value b2/x-auto to tell Backblaze B2 to set the MIME type according to the file name’s extension, if one is present. It is important to set the MIME type correctly for reliable playback.

For those managing high-capacity storage, getting MIME types right helps streamline delivery and maintain compatibility with newer software and streaming protocols.

Putting it all together

So, we’ve covered the ingredients for streaming media from Backblaze B2 directly to a web page:

The HTML5 <audio> and <video> elements.
HTTP 1.1 byte range serving.
Encoding media for Fast Start.
Storing media files in Backblaze B2 with the correct MIME type.

Here’s an HTML5 page with a minimal example of an embedded video file:

<!DOCTYPE html>
<html>
  <body>
    <h1>Video</h1>
    <video controls src="my-video.mp4" width="640px"></video>
  </body>
</html>

The controls attribute tells the browser to show the default set of controls for playback. Setting the width of the video element makes it a more manageable size than the default, which is the video’s dimensions. This short video shows the video element in action:

Managing download costs with efficient data storage solutions

When serving media files from your account, you need to consider download charges as part of your overall data management strategy. Backblaze offers a few ways to manage these charges. To start, the first 1GB of data downloaded from your Backblaze B2 account per day is free. After that, we charge $0.01/GB—notably less than AWS at $0.05+/GB, Azure at $0.04+, and Google Cloud Platform at $0.12.

We also cover the download fees between Backblaze B2 and many CDN partners like Cloudflare, Fastly, and Bunny.net, so you can serve content closer to your end users via their edge networks. You’ll want to make sure you understand if there are limits on your media downloads from those vendors by checking the terms of service for your CDN account. Some service levels do restrict downloads of media content.

Start streaming: Your media storage journey begins

Now you know everything you need to know to get started encoding, uploading, and serving audio/visual content from Backblaze B2 Cloud Storage. Backblaze B2 is a great way to experiment with multimedia—the first 10GB of storage is free, and Backblaze pricing includes free egress per month.

Sign up free, no credit card required, and start building your long-term backup and streaming infrastructure with Backblaze B2.

The post Streaming Media with Backblaze B2: A Data Storage Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Server Backup 101: Disaster Recovery Planning

2022-07-19 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/server-backup-101-disaster-recovery-planning/

In any business, time is money. What may shock you is how much money that time is actually worth. According to Gartner, the average cost of one hour of downtime for a business is roughly $300,000. That’s $5,600 a minute. Multiply that out by the amount of time it takes to recover from data theft, sabotage, or a natural disaster, and you could easily be looking at millions of dollars in lost revenue. That is, unless you’ve planned ahead with an effective disaster recovery plan.

Even one hour of lost time due to a cyberattack or natural disaster could adversely affect your business operations. Read on to learn how to develop an effective disaster recovery plan so you can quickly rebound no matter what happens, including:

Knowing what a disaster recovery plan is and why you need it.
Developing an effective strategy.
Identifying key roles.
Prioritizing business operations and objectives.
Deploying backups.

Check out the other posts in our Server Backup 101 series:

What Is a Disaster Recovery Plan?

A disaster recovery plan is made up of resources and processes that a business can use to restore apps, data, digital assets, equipment, and network operations in the event of any unplanned disruption.

Events such as natural disasters (floods, fires, earthquakes, etc.), theft, and cybercrime often interrupt business operations or restrict access to data. The goal of a disaster recovery plan is to get back up and running as quickly and smoothly as possible.

Some companies will choose to write their own disaster recovery plans, while others may contract with a managed service provider (MSP) specializing in disaster recovery as a service (DRaaS). Either way, crafting a disaster recovery plan that covers you for any contingency is crucial.

Why Do You Need a Disaster Recovery Plan?

A disaster recovery plan is not just a good idea, it is an essential component of your business. Cybercrime is on the rise, targeting small and medium-sized businesses just as often as large corporations. According to Cybersecurity Magazine, 43% of recent data breaches affected small and medium-sized businesses. Additionally, you could be cut off from your data by power outages, hardware failure, data corruption, and natural occurrences that restrict IT workflows. So, why do you need a disaster recovery plan? A few key benefits rise to the top:

Your disaster recovery plan will ensure business continuity in the case of a disaster. Imagine the confidence of knowing that no matter what happens, your business is prepared and can continue operations seamlessly.
An effective disaster recovery plan will help you get back up and running faster and more efficiently.
The plan also helps to communicate to your entire team, from top to bottom, what to do in the event of an emergency.

Writing a Disaster Recovery Plan: What Should Your Disaster Recovery Plan Include?

A solid disaster recovery plan should include five main elements, which we’ll detail below:

An effective strategy.
Key team members who can carry out the plan.
Clear objectives and priorities.
Solid backups.
Testing protocols.

An Effective Strategy

One of the most critical aspects of your disaster recovery plan should be your strategy. Typically, the details of a disaster recovery plan include steps for prevention, preparation, mitigation, and recovery. Think about both the big picture and fine details when putting together the pieces.

Disaster Recovery Planning Case Study: Santa Cruz Skateboards

Santa Cruz Skateboards safeguarded decades worth of data with a disaster recovery plan and backups to prevent loss from the threat of tsunamis on the California coast. Read more about how they did it.

Some tips for creating an effective strategy include:

Identify possible disasters. Consider the types of disasters your business may encounter and design your plan around those. Every business is susceptible to cybercrime, which should be a significant component of your plan. If your business is located in a disaster prone location, let that dictate your plan objectives.
Plan for “minor” disasters. A “major” disaster like an earthquake could take out the entire office and on-premises infrastructure, but “minor” disasters can also be disruptive. Good employees make mistakes and delete things, and bad employees sometimes make worse mistakes. A disaster recovery plan protects you from those “minor” disasters as well.
Create multiple disaster recovery plans. You may need to create different versions of your disaster recovery plan based on specific scenarios and the severity of the disaster. For example, you may need a plan that responds to a cyberattack and restores data quickly, while another plan may deal with hardware destruction and replacement rather than data restoration.
Plan from your recovery backward. Think about what you need to accomplish with your disaster recovery and plan your backup routine to support it. Then, after your plan is written, go back and ensure that your backup routine follows the plan initiatives and accomplishes the goals in an acceptable time frame.
Develop KPIs. Include critical key performance indicators (KPIs) in the plan, such as a recovery time objective (RTO) and recovery point objective (RPO). RTO refers to how quickly you intend to restore your systems after a disaster, and RPO is the maximum amount of data loss you can safely incur.

Establish the Key Team Members and Their Roles and Hierarchy

Another crucial component of your disaster recovery plan is identifying key team members to carry out the instructions. You must clearly define roles and hierarchy for effectiveness. Consider the following when building your disaster recovery team:

Communicate roles and hierarchy. Ensure that each team member knows their role in the plan and understands where they land in the hierarchy. Build in redundancy if a major player is unavailable.
Develop a master contact list. Create a master list with updated contact information for each team member and update it regularly as things change. Be sure the list includes everyone’s cell phone and landline numbers (if applicable) and emergency contacts for each person. Don’t assume you will have working internet and consider alternative ways to reach critical team members in the middle of the night.
Plan on how to manage your team. Think about how you will stay organized and manage your team to function 24/7 until you resolve the disaster.

Prioritize Business Operations and Objectives

Another important aspect of your disaster recovery plan is prioritizing business operations and objectives and crafting your plan around those.

Identify the most critical aspects of the business that need to be restored first. Then, focus on those and leave the less essential things until later. Understand that it is not feasible to restore everything at once. Instead, you must prioritize the most critical business areas and get those up and running and then, other, less crucial parts of the system. Detail these priorities in your plan so that no one wastes time on nonessential operations.

Know How to Deploy Your Backups

Backups should be a routine function for your organization, and you should know them inside and out. Be sure to familiarize yourself with every aspect of the backup process, including where data is stored, how recent it is, and how to restore it at a moment’s notice.

Having a reliable backup plan could save your business. You don’t want to waste precious time figuring out where the latest backup is, where it’s stored (whether that’s locally or on the cloud), or how to access it. Off-site cloud storage is a safe, reliable way to store and retrieve your data, especially in the event of a disaster.

Practice restoring your backups regularly to test their viability. Document the process for restoring in case you are unavailable and someone else has to take over. Data restoration should be a central part of your disaster recovery plan. Remember, backups are not your entire disaster recovery plan but only a piece of the overall system.

Foolproof Your Plan With Disaster Recovery Testing

The best-laid plans don’t always work out. Therefore, it’s essential that you foolproof your disaster recovery plan by testing it regularly (once a year, or every six months, whatever works for you). You don’t have to experience a real catastrophe; you can simulate what a disaster would look like and run through the entire process to ensure everything works as expected. Some disaster recovery testing best practices include:

Planning for the worst-case scenario. Think about things like access to a car, how you will get to the office, and how you will access your backups if they are stored online and you don’t have internet? Prepare by having multiple alternate plans (A, B, C, etc.). Remember, disasters come in all shapes and sizes so, be prepared to think outside the box. When the COVID-19 pandemic started, businesses had to scramble to adjust. Prepare for anything, even minor disruptions or cut-offs from resources you rely on.
Securing resources in advance. If you need resources to make it work, such as budgetary funds, software, hardware, or services, get those approved now so you’re not stuck provisioning necessary resources in the middle of a disaster.
Regularly reviewing and updating your disaster recovery plan as things change. Team members come and go, so schedule routine updates every three to six months to ensure that everything is up to date and viable.
Distributing copies of your disaster recovery plan. All staff members, including executives, should have a copy of your plan, and you should clearly communicate how it works and what everyone’s responsibility is.
Conducting post mortems after training and simulations (or a real disaster) to determine what works and what doesn’t. Make changes to your plan accordingly.

Don’t wait until a disaster occurs before writing your disaster recovery plan. A disaster recovery plan is an ever-evolving process you must maintain as the business changes and grows so you can face anything that the future brings.

Disaster Recovery, Done.

Ready to check disaster recovery off your list? Check out our Instant Recovery in Any Cloud solution that you can use as part of your disaster recovery plan. You can run a single command to instantly see your servers, data, firewalls, and network storage. Get back up and running as soon as possible with minimal disruption and expense to your business.

The post Server Backup 101: Disaster Recovery Planning appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Free Isn’t Always Free: A Guide to Free Cloud Tiers

2022-07-14 Amrit Singh

Post Syndicated from Amrit Singh original https://www.backblaze.com/blog/free-isnt-always-free-a-guide-to-free-cloud-tiers/

Free Isn’t Always Free

They say “the best things in life are free.” But when most cloud storage companies offer a free tier, what they really want is money. While free tiers do offer some early-stage technical founders the opportunity to test out a proof of concept or allow students to experiment without breaking the bank, their ultimate goal is to turn you into a paying customer. This isn’t always nefarious (we offer 10GB for free, so we’re playing the same game!), but some cloud vendors’ free tiers come with hidden surprises that can lead to scary bills with little warning.

The truth is that free isn’t always free. Today, we’re digging into a cautionary tale for developers and technical founders exploring cloud services to support their applications or SaaS products. Naturally, you want to know if a cloud vendor’s free tier will work for you. Understanding what to expect and how to navigate free tiers accordingly can help you avoid huge surprise bills later.

Free Tiers: A Quick Reference

Most large, diversified cloud providers offer a free tier—AWS, Google Cloud Platform, and Azure, to name a few—and each one structures theirs a bit differently:

AWS: AWS has 100+ products and services with free options ranging from “always free” to 12 months free, and each has different use limitations. For example, you get 5GB of object storage free with AWS S3 for the first 12 months, then you are billed at the respective rate.
Google Cloud Platform: Google offers a $300 credit good for 90 days so you can explore services “for free.” They also offer an “always free tier” for specific services like Cloud Storage, Compute Engine, and several others that are free to a certain limit. For example, you get 5GB of storage for free and 1GB of network egress for their Cloud Storage service.
Azure: Azure offers a free trial similar to Google’s but with a shorter time frame (30 days) and lower credit amount ($200). It gives you the option to move up to paid when you’ve used up your credits or your time expires. Azure also offers a range of services that are free for 12 months and have varying limits and thresholds as well as an “always free tier” option.

After even a quick review of the free tier offers from major cloud providers, you can glean some immediate takeaways:

You can’t rely on free tiers or promotional credits as a long-term solution. They work well for testing a proof of concept or a minimum viable product without making a big commitment, but they’re not going to serve you past the time or usage limits.
“Free” has different mileage depending on the platform and service. Keep that in mind before you spin up servers and resources, and read the fine print as it relates to limitations.
The end goal is to move you to paid. Obviously, the cloud providers want to move you from testing a proof of concept to paid, with your full production hosted and running on their platforms.

With Google Cloud Platform and Azure, you’re at least somewhat protected from being billed beyond the credits you receive since they require you to upgrade to the paid tier to continue. Thus, most of the horror stories you’ll see involve AWS. With AWS, once your trial expires or you exceed your allotted limits, you are billed the standard rate. For the purposes of this guide, we’ll look specifically at AWS.

The Problem With the AWS Free Tier

The internet is littered with cautionary tales of AWS bills run amok. A quick search for “AWS free tier bill” on Twitter or Reddit shows that it’s possible and pretty common to run up a bill on AWS’s so-called free tier…
Twitter - Free Tier Guide
Twitter 2 - Free Tier Guide
Reddit - Free Tier Guide

The problem with the AWS free tier is threefold:

There are a number of ways a “free tier” instance can turn into a bill.
Safeguards against surprise bills are mediocre at best.
Surprise bills are scary, and next steps aren’t the most comforting.

Problem 1: It’s Really Easy to Go From Free to Not Free

There are a number of ways an unattended “free tier” instance turns into a bill, sometimes a catastrophically huge bill. Here are just a few:

You spin up Elastic Compute Cloud (EC2) instances for a project and forget about them until they exceed the free tier limits.
You sign up for several AWS accounts, and you can’t figure out which one is running up charges.
Your account gets hacked and used for mining crypto (yes, this definitely happens, and it results in some of the biggest surprise bills of them all).

Problem 2: Safeguards Against Surprise Bills Are Mediocre at Best

Confounding the problem is the fact that AWS keeps safeguards against surprise billing to a minimum. The free tier has limits and defined constraints, and the only way to keep your account in the free zone is to keep usage below those limits (and this is key) for each service you use.

AWS has hundreds of services, and each service comes with its own pricing structure and limits. While one AWS service might be free, it can be paired with another AWS service that’s not free or doesn’t have the same free threshold, for example, egress between services. Thus, managing your usage to keep it within the free tier can be somewhat straightforward or prohibitively complex depending on which services you use.

Wait, Shouldn’t I Get Alerts?

Yes, you can get alerted if you’re approaching the free limit, but that’s not foolproof either. First, billing alarms are not instantaneous. The notification might come after you’ve already exceeded the limit. And second, not every service has alerts or alerts that work in the same way.

You can also configure services so that they automatically shut down when they exceed a certain billing threshold, but this may pose more problems than it solves. First, navigating the AWS UI to set this up is complex. Your average free tier user may not be aware of or even interested in how to set that up. Second, you may not want to shut down services depending on how you’re using AWS.

Problem 3: Knowing What to Do Next

If it’s not your first rodeo, you might not default to panic mode when you get that surprise bill. You tracked your usage. You know you’re in the right. All you have to do is contact AWS support and dispute the charge. But imagine how a college student might react to a bill the size of their yearly tuition. While large five- to six-figure bills might be negotiable and completely waived, there are untold numbers of two- to three-figure bills that just end up getting paid because people weren’t aware of how to dispute the charges.

Even experienced developers can fall victim to unexpected charges in the thousands.

Avoiding Unexpected AWS Bills in the First Place

The first thing to recognize is that free isn’t always free. If you’re new to the platform, there are a few steps you can take to put yourself in a better position to avoid unexpected charges:

Read the fine print before spinning up servers or uploading test data.
Look for sandboxed environments that don’t let you exceed charges beyond a certain amount or that allow you to set limits that shut off services once limits are exceeded.
Proceed with caution and understand how alerts work before spinning up services.
Steer clear of free tiers completely, because the short-term savings aren’t huge and aren’t worth the added risk.

Final Thought: It Ain’t Free If They Have Your CC

AWS requires credit card information before you can do anything on the free tier—all the more reason to be extremely cautious.

Shameless plug here: Backblaze B2 Cloud Storage offers the first 10GB of storage free, and you don’t need to give us a credit card to create an account. You can also set billing alerts and caps easily in your dashboard. So, you’re unlikely to run up a surprise bill.

Ready to get started with Backblaze B2 Cloud Storage? Sign up here today to get started with 10GB and no CC.

The post Free Isn’t Always Free: A Guide to Free Cloud Tiers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Do More With Your Data With the Backblaze + Aparavi Joint Solution

2022-07-12 Jennifer Newman

Post Syndicated from Jennifer Newman original https://www.backblaze.com/blog/do-more-with-your-data-with-the-backblaze-aparavi-joint-solution/

It’s almost a guarantee that no data analyst, data manager, CIO, or CEO for that matter, ever uttered the words, “I wish we did less with our data.” You always want to do more—squeeze more value out of it, learn more from it, and make it work harder for you.

Aparavi helps customers do just that. The cloud-based platform is designed to unlock the value of data, no matter where it lives. Backblaze’s new partnership with Aparavi offers joint customers simple, scalable cloud storage services for unstructured data management. Read on to learn more about the partnership.

What Is Aparavi?

Aparavi is a cloud-based data intelligence and automation platform that helps customers identify, classify, optimize, and move unstructured data no matter where it resides. The platform finds, automates, governs, and consolidates distributed data easily using deep intelligence. It ensures secure access for modern data demands of analytics, machine learning, and collaboration, connecting business and IT to transform data into a competitive asset.

How Does Backblaze Integrate With Aparavi?

The Aparavi Data Intelligence and Automation Platform and Backblaze B2 Cloud Storage together provide data lifecycle management and universal data migration services. Joint customers can choose Backblaze B2 as a destination for their unstructured data.

“We are very excited about our partnership with Backblaze. This partnership will combine Aparavi’s automated and continuous data movement with Backblaze B2’s simple, scalable cloud storage services to help companies know and visualize their data, including the impact of risk, cost, and value they may or may not be aware of today.”
—Adrian Knapp, CEO and Founder, Aparavi

How Does This Partnership Benefit Joint Customers?

The partnership delivers in three key value areas:

It facilitates redundant, obsolete, trivial—commonly referred to as ROT—data cleanup, helping to reduce on-premises operational costs, redundancies, and complexities.
It recognizes personally identifiable information to deliver deeper insights into organizational data.
It enables data lifecycle management and automation to low-cost, secure, and highly available Backblaze B2 Cloud Storage.

“Backblaze helps organizations optimize their infrastructure in B2 Cloud Storage by eliminating their biggest barrier to choosing a new provider: excessive costs and complexity. By partnering with Aparavi, we can take that to the next level for our joint customers, providing cost-effective data management, storage, and access.”
—Nilay Patel, Vice President of Sales and Partnerships, Backblaze

Getting Started With Backblaze B2 and Aparavi

Ready to do more with your data affordably? Contact our Sales team today to get started.

The post Do More With Your Data With the Backblaze + Aparavi Joint Solution appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Ransomware Takeaways From Q2 2022

2022-07-07 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/ransomware-takeaways-from-q2-2022/

When you’re responsible for protecting your company’s data from ransomware, you don’t need to be convinced of the risks an attack poses. Staying up to date on the latest ransomware trends is probably high on your radar. But sometimes it’s not as easy to convince others in your organization to take the necessary precautions. Protecting your data from ransomware might require operational changes and investments, and that can be hard to advance, especially when headlines report that dire predictions haven’t come true.

To help you stay up to date and inform others in your organization of the latest threats and what you can do about them, we put together five quick, timely, shareable takeaways from our monitoring over Q2 2022.

This post is a part of our ongoing series on ransomware. Take a look at our other posts for more information on how businesses can defend themselves against a ransomware attack, and more.

“Ransomware: How to Prevent or Recover From an Attack”
“Introducing the Ransomware Economy”
“Object Lock 101: Protecting Data From Ransomware”
“The True Cost of Ransomware”
2021 Ransomware Takeaways: Q1, Q2, Q3, Q4
2022 Ransomware Takeaways: Q1

1. Sanctions Are Changing the Ransomware Game

Things have been somewhat quieter on the ransomware front, and many security experts point out that the sanctions against Russia have made it harder for cybercriminals to ply their trade. The sanctions make it harder to receive payments, move money around, and provision infrastructure. As such, The Wall Street Journal reported that the ransomware economy in Russia is changing. Groups are reorganizing, splintering off into smaller gangs, and changing up the software they use to avoid detection.

Key Takeaway: Cybercriminals are working harder to avoid revealing their identities, making it challenging for victims to know whether they’re dealing with a sanctioned entity or not. Especially at a time when the federal government is cracking down on companies that violate sanctions, the best fix is to put an ironclad sanctions compliance program in place before you’re asked about it.

2. AI-powered Ransomware Is Coming

The idea of AI-powered ransomware is not new, but we’ve seen predictions in Q2 that it’s closer to reality than we might think. To date, the AI advantage in the ransomware wars has fallen squarely on the defense. Security firms employ top talent to automate ransomware detection and prevention.

Meanwhile, ransomware profits have escalated in recent years. Chainalysis, a firm that analyzes crypto payments, reported ransomware payments in excess of $692 million in 2020 and $602 million in 2021 (which they expect to continue to go up with further analysis), up from just $152 million in 2019. With business booming, some security experts warn that, while cybercrime syndicates haven’t been able to afford developer talent to build AI capabilities yet, that might not be the case for long.

They predict that, in the coming 12 to 24 months, ransomware groups could start employing AI capabilities to get more efficient in their ability to target a broader swath of companies and even individuals—small game for cybercriminals at the moment but not with the power of machine learning and automation on hand.

Key Takeaway: Small to medium-sized enterprises can take simple steps now to prevent future “spray and pray” style attacks. It may seem too easy, but fundamental steps like staying up to date on security patches and implementing multi-factor authentication can make a big difference in keeping your company safe.

3. Conti Ransomware Group Still In Business

In Q1, we reported that the ransomware group Conti suffered a data leak after pledging allegiance to Russia in the wake of the Ukraine invasion. Despite the leak, business seems to be trucking along over at Conti HQ. Despite suffering a leak of its own sensitive data, Conti doesn’t seem to have learned a lesson. The group continues threatening to publish stolen data in return for encryption keys—a hallmark of the group’s tactics.

Key Takeaway: As detailed in ZDnet, Conti tends to exploit unpatched vulnerabilities, so, again, staying up to date on security patches is advised, as is ramping up monitoring of your networks for suspicious activity.

4. Two-thirds of Victims Paid Ransoms Last Year

New analyses that came out in Q2 from CyberEdge group, covering the span of 2021 overall, found that two-thirds of ransomware victims paid ransoms in 2021. The firm surveyed 1,200 IT security professionals, and found three reasons why firms choose to make the payments:

Concerns about exfiltrated data getting out.
Increased confidence they’ll be able to recover their data.
Decreasing cost of recoveries.

When recoveries are easier, more firms are opting just to pay the attackers to go away, avoid downtime, and recover from some mix of backups and unencrypted data.

Key Takeaway: While we certainly don’t advocate for paying ransoms, having a robust disaster recovery plan in place can help you survive an attack and even avoid paying the ransom altogether.

5. Hacktivism Is on the Rise

With as much doom and gloom as we cover in the ransomware space, it seems hacking for a good cause is on the rise. CloudSEK, an AI firm, profiled the hacking group GoodWill’s efforts to force…well, some goodwill. Instead of astronomical payments in return for decryption keys, GoodWill simply asks that victims do some good in the world. One request: “Take any five less fortunate children to Pizza Hut or KFC for a treat, take pictures and videos, and post them on social media.”

Key Takeaway: While the hacktivists seem to have good intentions at heart, is it truly goodwill if it’s coerced with your company’s data held hostage? If you’ve been paying attention, you have a strong disaster recovery plan in place, and you can restore from backups in any situation. Then, consider their efforts a good reminder to revisit your corporate social responsibility program as well.

The Bottom Line: What This Means for You

Ransomware gangs are always changing tactics, and even more so in the wake of stricter sanctions. That, combined with the potential emergence of AI-powered ransomware means a wider range of businesses could be targets in the coming months and years. As noted above, applying good security practices and developing a disaster recovery plan are excellent steps towards becoming more resilient as tactics change. And the good news, at least for now, is that not all hackers are forces for evil even if some of their tactics to spread goodwill are a bit brutish.

The post Ransomware Takeaways From Q2 2022 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Server Backup 101: Developing a Server Backup Strategy

2022-07-06 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/server-backup-101-developing-a-server-backup-strategy/

In business, data loss is unavoidable unless you have good server backups. Files get deleted accidentally, servers crash, computers fail, and employees make mistakes.

However, those aren’t the only dangers. You could also lose your company data in a natural disaster or cybersecurity attack. Ransomware is a serious concern for small to medium-sized businesses as well as large enterprises. Smart companies plan ahead to avoid data loss.

This post will discuss server backup basics, the different types of server backup, why it’s critical to keep your data backed up, and how to create a solid backup strategy for your company. Read on to learn everything you ever wanted to know about server backups.

Check out the other posts in our Server Backup 101 series:

First Things First: What Is a Server?

A server is a virtual or physical device that performs a function to support other computers and users. Sometimes servers are dedicated machines used for a single purpose, and sometimes they serve multiple functions. Other computers or devices that connect to the server are called “clients.” Typically, clients use special software to communicate with the server and reply to requests. This communication is referred to as the server/client model. Some common uses for this setup include:

Web Server: Hosts web pages and online applications.
Email Server: Manages email for a company.
Database Server: Hosts various databases and controls access.
Application Server: Allows users to share applications.
File Server: Used to host files shared on a network.
DNS Server: Used to decode web addresses and deliver the user to the correct address.
FTP Server: Used specifically for hosting files for shared use.
Proxy Server: Adds a layer of security between client and server.

Servers run on many operating systems (OS) such as Windows, Linux, Mac, Apache, Unix, NetWare, and FreeBSD. The OS handles access control, user connections, memory allocation, and network functions. Each OS offers varying degrees of control, security, flexibility, and scalability.

Why It’s Important to Back Up Your Server

Did you know that roughly 40% of small and medium-sized businesses (SMBs) will be attacked by cybercriminals within a year, and 61% of all SMBs have already been attacked? Additionally, statistics show that 93% of companies that lost data for more than 10 days were forced into bankruptcy within a year. More than half of them filed immediately, and most shut down.

Company data is vulnerable to fire, theft, natural disasters, hardware failure, and cybercrime. Backups are an essential prevention tool.

Types of Servers

Within the realm of servers, there are many different types for virtually any purpose and environment. However, the primary function of most servers is data storage and processing. Some examples of servers include:

Physical Servers: These are hardware devices (usually computers) that connect users, share resources, and control access.
Virtual Servers: Using special software (called a hypervisor), you can set up multiple virtual servers on one physical machine. Each server acts like a physical server while the hypervisor manages memory and allocates other system resources as needed.
Hybrid Servers: Hybrids are servers combining physical servers and virtual servers. They offer the speed and efficiency of a physical server combined with the flexibility of cloud-hosted resources.
NAS Devices: Network-attached storage (NAS) devices store data and are accessed directly through the network without first connecting to a computer. These hardware devices contain a storage drive, processor, and OS, and can be accessed remotely.
SAN Server: Although not technically a server, a storage area network (SAN) connects multiple storage devices to multiple servers expanding the network and controlling connections.
Cloud Servers: Cloud servers exist in a virtual online environment, and you can access them through web portals, applications, and specialized software.

Regardless of how you save your data and where, backups are essential to protecting yourself from loss.

How to Back Up a Server

You have options for backing up data, and the methods vary. First, let’s talk about terminology.

Backup vs. Archive

Backing up is copying your data, whereas an archive is a historical copy that you keep for retention purposes, often for long periods. Archives are typically used to save old, inactive data for compliance reasons.

Here are two examples that illustrate backups vs. an archives. An example of a backup is when your mobile phone backs up to the cloud, and if you factory reset the phone, you can restore all your applications, settings, and data from the backup copy. An example of an archive is a tape backup of old HR files that have long since been deleted from the server.

Backup vs. Sync

Sometimes people confuse the word backup with sync. They are not the same thing. A backup is a copy of your data you can use to restore lost files. Syncing is the automatic updating and merging of two file sources. Cloud computing often uses syncing to keep files in one location identical to files in another.

To prevent data loss, backups are the process to use. Syncing overwrites files with the latest version; a backup can restore back to a single point in time, so you don’t lose anything valuable.

Backup Destinations

When selecting a backup destination, you have many mediums to choose from. There are pros and cons for each type. Some popular backup destinations and their pros and cons are as follows:

Destination	Pros	Cons
External Media (USB, CD, Removable Hard Drives, Flash Drives, etc.)	Quick, easy, affordable.	Fragile if dropped, crushed, or exposed to magnets; very small capacity.
NAS	Always available on the network, small size, and great for SMBs.	Vulnerable to on-premises threats and non-scalable due to limits.
Network or SAN Storage	High speed, view connected drives as local, good security, failover protection, excellent disk utilization, and high-end disaster recovery options.	Can be expensive, doesn’t work with all types of servers, and is vulnerable to attacks on the network.
Tape	Dependable (robust, not fragile), can be kept for years, low cost, and simple to replicate.	High initial setup costs, limited scalability, potential media corruption over time, and time consuming to manage.
FTP	Excellent for large files, copy multiple files at once, can resume if the connection is lost, schedule backups and recover lost data.	No security, vendors vary widely, not all solutions include encryption, and vulnerable to attacks.
File-sharing Services (Dropbox, OneDrive, iCloud, etc.)	Quick and easy to use; inexpensive. Great for collaborating and sharing data.	Most file-sharing services use file syncing rather than a true cloud backup.

Cloud backups are an altogether different type of backup; typically, you have two options available: all-in-one tools or integrated solutions.

All-in-one Tools

All-in-one tools like Carbonite Safe, Carbonite Server, Acronis, IDrive, CrashPlan, and SpiderOak combine both the backup software and the backend cloud storage in one offering. They have the ability to back up entire operating systems, files, images, videos, and sometimes even mobile device data. Depending on the tool you choose, you may be able to back up an unlimited number of devices, or you may have limits. However, most of these all-in-one solutions are expensive and can be complex to use. All those bells and whistles often come at a price—a steep learning curve.

Integrated Solutions (Backup Software Paired With Cloud Storage)

Pairing software and cloud storage is another option that combines the best of both worlds. It allows users to choose the software they want with the features they need and fast, reliable cloud storage. Cloud storage is scalable, so you will never run out of space as your business grows. Using your chosen software, it’s fast and easy to restore your files. Although it may seem counterintuitive, it’s often more affordable to use two integrated solutions versus an all-in-one tool. Another big bonus of using cloud storage is that it integrates with many popular software options. For example, Backblaze works seamlessly with:

An important factor to consider when choosing the right backup software and cloud storage is compatibility. Research which platforms your software will back up and what types of backups it offers (file, image, system, etc.). You also need to think about the restore process and your options (e.g., file, folder, bare metal/image, virtual, etc.). User-friendliness is important when deciding. Some programs like rClone require a working knowledge of command line. Choose a software program that is best for you.

Think about scalability and how much storage it can handle now and in the future as your business grows. A few other things to consider are pricing, security, and support. Your backup files are no good if they are vulnerable to attack. Compare prices and check out the support options before making your final decision.

Creating a Solid Backup Strategy

A solid backup strategy is the best way to protect your company against data loss. Again, you have options. The 3-2-1 strategy is the gold standard, but some companies are choosing options like a 3-2-1-1-0 option or even a 4-3-2 scheme. Learn more about how each plan works.

Before determining your strategy, you must consider what data you need to back up. For example, will you be backing up just servers or also workstations and dedicated servers, such as email servers or SaaS data devices?

Another concern is how you will get your data into the cloud. You need to figure out which method will work best for you. You have the option of direct transfer over internet bandwidth or using a rapid ingest device (e.g., the Backblaze Fireball rapid ingest device).

Universal Data Migration

Migrating your data can seem like an insurmountable task. We launched our Universal Data Migration service to make migrating to Backblaze just as easy as it is to use Backblaze. You can migrate from virtually any source to Backblaze B2 Cloud Storage, and it’s free to new customers who have 10TB of data or more to migrate with a one-year commitment.

How Often Should You Back Up Your Data?

Should you run full backups regularly? Or rely on incremental backups? The answer is that both have their place.

To fully protect yourself, performing regular full backups and keeping them safe is essential. Full backups can be scheduled for slow times or performed overnight when no one is using the data. Remember that full backups take the longest to complete and are the costliest but the easiest to restore.

A full backup backs up the entire server. An incremental backup only backs up files that have changed or been added since the last backup, saving storage space. The cadence of full versus incremental backups might look different for each organization. Learn more about full vs. incremental, differential, and full synthetic backups.

How Long Should You Keep Your Previous Backups?

You also must consider how long you want to keep your previous backups. Will you keep them for a specific amount of time and overwrite older backups?

By overwriting the files, you can save space, but you may not have an old enough backup when you need it. Also, keep in mind that many cloud storage vendors have minimum retention policies for deleted files. While “retention” sounds like a good thing, in this case it’s not. They might be charging you for data storage for 30, 60, or even 90 days even if you deleted it after storing it for just one day. That may also factor into your decision about how long you should keep your previous backup files. Some experts recommend three months, but that may not be enough in some situations.

You need to keep full backups for as long as you might need to recover from various issues. If, for example, you are infiltrated by a cybercriminal and don’t discover it for two months, will your oldest backup be enough to restore your system back to a clean state?

Another question to think about is if you’ll keep an archive. As a refresher, an archive is a backup of historical data that you keep long-term even if the files have already been deleted from the server. Most sources say you should plan to keep archives forever unless you have no use for the data in the future, but your company might have a different appetite for retention timeframes. Forever probably seems like…well, a long time, but keep in mind that the security of having those files available may be worth it.

How Will You Monitor Your Backup?

It’s not enough to just schedule your backups and walk away. You need to monitor them to ensure they are occurring on schedule. You should also test your ability to restore and fully understand the options you have for restoring your data. A backup is only as good as its ability to restore. You must test this out periodically to ensure you have a solid disaster recovery plan in place.

Special Considerations for Backing Up

When backing up servers with different operating systems, you need to consider the constraints of that system. For example, SQL servers can handle differential backups, whereas other servers cannot. Some backup software like Veeam integrates easily with all the major operating systems and therefore supports backups of multiple servers using different platforms.

If you are backing up a single server, things are easy. You have only one OS to worry about. However, if you are backing up multiple servers with different platforms and applications running on them, things could get more complex. Be sure to research all your options and use a vendor that can easily handle groups management and SaaS-managed backup services so that you can view all your data through a single pane of glass. You want consolidation and easy delineation if you need to pinpoint a single system to restore. You can use groups to easily manage different servers with similar operating systems to keep things organized and streamline your backup strategy.

As you can see, there are many facets to server backups, and you have options. If you have questions or want to learn more about Backblaze backup solutions, contact us today. Or, click here if you’re ready to get started backing up your server.

The post Server Backup 101: Developing a Server Backup Strategy appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Cloud Storage Pricing: What You Need to Know

2022-06-30 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/cloud-storage-pricing-what-you-need-to-know/

Between tech layoffs and recession fears, economic uncertainty is at a high. If you’re battening down the hatches for whatever comes next, you might be taking a closer look at your cloud spend. Even before the bear market, 59% of cloud decision makers named “optimizing existing use of cloud (cost savings)” as their top cloud initiative of 2022 according to the Flexera State of the Cloud report.

Cloud storage is one piece of your cloud infrastructure puzzle, but it’s one where some simple considerations can save you anywhere from 25% up to 80%. As such, understanding cloud storage pricing is critical when you are comparing different solutions. When you understand pricing, you can better decide which provider is right for your organization.

In this post, we won’t look at 1:1 comparisons of cloud storage pricing, but you can check out a price calculator here. Instead, you will learn tips to help you make a good cloud storage decision for your organization.

Evaluating Your Cloud Storage? Gather These Facts

Looking at the pricing options of different cloud providers only makes sense when you know your needs. Use the following considerations to clarify your storage needs to approach a cloud decision thoughtfully:

How do you plan to use cloud storage?
How much does cloud storage cost?
What features are offered?

1. How Do You Plan to Use Cloud Storage?

Some popular use cases for cloud storage include:

Backup and archive.
Origin storage.
Migrating away from LTO/tape.
Managing a media workflow.

Backup and Archive

Maintaining data backups helps make your company more resilient. You can more easily recover from a disaster and keep serving customers. The cloud provides a reliable, off-site place to keep backups of your company workstations, servers, NAS devices, and Kubernetes environments.

Case Study: Famed Photographer Stores a Lifetime of Work

Photographer Steve McCurry, renowned for his 1984 photo of the “Afghan Girl” which has been on the cover of National Geographic several times, backed up his life’s work in the cloud when his team didn’t want to take chances with his irreplaceable archives.

Origin Storage

If you run a website, video streaming service, or online gaming community, you can use the cloud to serve as your origin store where you keep content to be served out to your users.

Case Study: Serving 1M+ Websites From Cloud Storage

Big Cartel hosts more than one million e-commerce websites. To increase resilience, the company recently started using a second cloud provider. By adopting a multi-cloud infrastructure, the business now has lower costs and less risk of failure.

Migrating Away From LTO/Tape

Managing a tape library can be time-consuming and comes with high CapEx spending. With inflation, replacing tapes costs more, shipping tapes off-site costs more, and physical storage space costs more. The cloud provides an affordable alternative to storing data on tape where you pass the decreased margins off to a cloud provider—they have to worry about provisioning enough physical storage devices and space while you pay as you go.

Managing Media Workflow

Your department or organization may need to work with large media files to create movies or digital videos. Cloud storage provides an alternative to provisioning huge on-premises servers to handle large files.

Case Study: Using the Cloud to Store Media

Hagerty Insurance stored a huge library of video assets on an aging server that couldn’t keep up. They implemented a hybrid cloud solution for cloud backup and sync, saving the team over 200 hours per year searching for files and waiting for their slow server to respond.

2. How Much Does Cloud Storage Cost?

Cloud storage costs are calculated in a variety of different ways. Before considering any specific vendors, knowing the most common options, variables, and fees is helpful, including:

Flat or single-tier pricing vs. tiered pricing.
Hot vs. cold storage.
Storage location.
Minimum retention periods.
Egress fees.

Flat or Single-tier Pricing vs. Tiered Pricing

A flat or single-tier pricing approach charges the user based on the storage volume, and cost is typically expressed per gigabyte stored. There is only one tier, making budgeting and planning for cloud expenses simple.

On the other hand, some cloud storage services use a tiered storage pricing model. For example, a provider may have a small business pricing tier and an enterprise tier. Note that different pricing tiers may include different services and features. Today, your business might use an entry-level pricing tier but need to move to a higher-priced tier as you produce more data.

Hot vs. Cold Storage

Hot storage is helpful for data that needs to be accessible immediately (e.g., last month’s customer records). By contrast, cold storage is helpful for data that does not need to be accessed quickly (e.g., tax records from five years ago). For more insight on hot vs. cold storage, check out our post: “What’s the Diff: Hot and Cold Data Storage.” Generally speaking, cold storage is the cheapest, but that low price comes at the cost of speed. For data that needs to be accessed frequently or even for data where you’re not sure how often you need access, hot storage is better.

Storage Location

Some organizations need their cloud storage to be located in the same country or region due to regulations or just preference. But some storage vendors charge different prices to store data in different regions. Keeping data in a specific location may impact cloud storage prices.

Minimum Retention Periods

Most folks think of “retention” as a good thing, but some storage vendors enforce minimum retention periods that essentially impose penalties for deleting your data. Some vendors enforce minimum retention periods of 30, 60, or even 90 days. Deleting your data could cost you a lot, especially if you have a backup approach where you retire older backups before the retention period ends.

Egress Fees

Cloud companies charge egress fees when customers want to move their data out of the provider’s platform. These fees can be egregiously high, making it expensive for customers to use multi-cloud infrastructures and therefore locking customers into their services.

3. What Additional Features Are Offered?

While price is likely one of your biggest considerations, choosing a cloud storage provider solely based on price can lead to disappointment. There are specific cloud storage features that can make a big difference in your productivity, security, and convenience. Keep these features and capabilities in mind when comparing different cloud storage solutions.

Security Features

You may be placing highly sensitive data like financial records and customer service data in the cloud, so features like server-side encryption could be important. In addition, you might look for a provider that offers Object Lock so you can protect data using a Write Once, Read Many (WORM) model.

Data Speed

Find out how quickly the cloud storage provider can provide data regarding upload and download speed. Keep in mind that the speed of your internet connection also impacts how fast you can access data. Data speed is critically important in several industries, including media and live streaming.

Customer Support

If your company has a data storage problem outside of regular business hours, customer support becomes critically important. What level of support can you expect from the provider? Do they offer expanded support tiers?

Partner Integrations

Partner integrations make it easier to manage your data. Check if the cloud storage provider has integrations with services you already use.

The Next Step in Choosing Cloud Storage

Understanding cloud storage pricing requires a holistic view. First, you need to understand your organization’s data needs. Second, it is wise to understand the typical cloud storage pricing models commonly used in the industry. Finally, cloud storage pricing needs to be understood in the context of features like security, integrations, and customer service. Once you consider these steps, you can approach a decision to switch cloud providers or optimize your cloud spend more rigorously and methodically.

The post Cloud Storage Pricing: What You Need to Know appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.