Tag Archives: Cloud Storage

What Are Containers?

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/what-are-containers/

You’re probably familiar with containers if you’re even remotely involved in software development or systems administration. In their 2020 survey, the Cloud Native Computing Foundation found that the use of containers in production has increased to 92%, up 300% since their first survey in 2016.

But, even if containers are a regular part of your day-to-day life, or if you, like me, are a worthy student just trying to understand what an operating system kernel is, it helps to have an understanding of some container basics.

Today, we’re explaining what containers are, how they’re used, and how cloud storage fits into the container picture—all in one neat and tidy containerized blog post package. (And, yes, the kernel is important, so we’ll get to that, too).

What Are Containers?

Containers are packaged units of software that contain all of the dependencies (e.g. binaries, libraries, programming language versions, etc.) they need to run no matter where they live—on a laptop, in the cloud, or in an on-premises data center. That’s a fairly technical definition, so you might be wondering, “OK, but what are they really?”

Well, typically, when it comes to naming things, the tech world is second only to the world of pharmaceuticals in terms of absurdity (I see your Comirnaty, and I’ll raise you a Kubernetes, good sir… More on that later.). But sometimes, we get it right, and containers are one example.

The generally accepted definition of the term applies almost exactly to what the technology does.

A container, generally = a receptacle for holding goods; a portable compartment in which freight is placed (as on a train or ship) for convenience of movement.

A container in software development = a figurative “receptacle” for holding software. The second part of the definition applies even better—shipping containers are often used as a metaphor to describe what containers do. In shipping, instead of stacking goods all higgledy piggledy, goods are packed into standard-sized containers that fit on whatever is hauling them—a ship, a train, or a trailer.

Likewise, instead of “shipping” an unwieldy mess of code, including the required operating system, containers package software into lightweight units that share the same operating system (OS) kernel and can run anywhere—on a laptop, on a server, in the cloud, etc.

What’s an OS Kernel?

As promised, here’s where the OS kernel becomes important. The kernel is the core programming at the center of the OS that controls all other parts of the OS. In yet another example of naming things in tech, the term makes sense if you consider the definition of “kernel” as “the central or essential part” as in “a kernel of truth,” but also begs the question, “Why didn’t they just call it a colonel? Because now all I can envision is a popcorn seed inside a computer.” …But that’s neither here nor there, and now you know what an OS kernel does.

Compared to older virtualization technology, namely virtual machines which are measured in gigabytes, containers are only megabytes in size. That means you can run quite a few of them on a given computer or server much like you can stack many containers onto a ship.

Indeed, the founders of Docker, the software that sparked widespread container adoption, looked to the port of Oakland for inspiration. Former Docker CEO, Ben Golub, explained in an interview with InfoWorld, “We could see all the container ships coming into the port of Oakland, and we were talking about the value of the container in the world of shipping. The fact it was easier to ship a car from one side of the world than to take an app from one server to another, that seemed like a problem ripe for solving.” In fact, it’s right there in their logo.

Source: Docker logos.

And that is exactly what containers, mainly via Docker’s popularity, did—they solved the problem of environment inconsistency for developers. Before containers became widely used, moving software between environments meant things broke, a lot. If a developer wrote an app on their laptop, then moved it into a testing environment on a server, for example, everything had to be the same—same versions of the programming language, same permissions, same database access, etc. If not, you would have had a very sad app.

Virtualization 101

Containers work their magic by way of virtualization. Virtualization is the process of creating a simulated computing environment that’s abstracted from the physical computing hardware—essentially a computer-generated computer, also referred to as a software-defined computer.

The first virtualization technology to really take off was the virtual machine (VM). A VM sits atop a hypervisor—a lightweight software layer that allows multiple operating systems to run in tandem on the same hardware. VMs allow developers and system administrators to make the most of computing hardware. Before VMs, each application had to run on its own server, and it probably didn’t use the server’s full capacity. After VMs, you could use the same server to run multiple applications, increasing efficiency and lowering costs.

Containers vs. Virtual Machines

While VMs increase hardware efficiency, each VM requires its own OS and a virtualized copy of the underlying hardware. Because of this, VMs can take up a lot of system resources, and they’re slow to start up.

Containers, on the other hand, virtualize the underlying hardware and the OS. Because they don’t include their own OS, containers are much smaller and faster to spin up than VMs. Want to know more? Check out our deep dive into the differences between VMs and containers.

The Benefits of Containers

Containers allow developers and system administrators to develop, test, and deploy software and applications faster and more efficiently than older virtualization technologies like VMs. The benefits of containers include:

  1. Portability: Containers include all of the dependencies they need to run in any environment, provided that environment includes the appropriate OS. This reduces the errors and bugs that arise when moving applications between different environments, increasing portability.
  2. Size: Containers share OS resources and don’t include their own OS image, making them lightweight—megabytes compared to VMs’ gigabytes. As such, one machine or server can support many containers.
  3. Speed: Again, because they share OS resources and don’t include their own OS image, containers can be spun up in seconds compared to VMs which can take minutes to spin up.
  4. Resource efficiency: Similar to VMs, containers allow developers to make the best use of hardware and software resources.
  5. Isolation: Also similar to VMs, with containers, different applications or even component parts of a singular application can be isolated such that issues like excessive load or bugs on one don’t impact others.

Container Use Cases

Containers are nothing if not versatile, so they can be used for a wide variety of use cases. However, there are a few instances where containers are especially useful:

  1. Enabling microservices architectures: Before containers, applications were typically built as all-in-one units or “monoliths.” With their portability and small size, containers changed that, ushering in the era of microservices architecture. Applications could be broken down into their component “services,” and each of those services could be built in its own container and run independently of the other parts of the application. For example, the code for your application’s search bar can be built separately from the code for your application’s shopping cart, then loosely coupled to work as one application.
  2. Supporting modern development practices: Containers and the microservices architectures they enable paved the way for modern software development practices. With the ability to split applications into their component parts, each part could be developed, tested, and deployed independently. Thus, developers can build and deploy applications using modern development approaches like DevOps, continuous integration/continuous deployment (CI/CD), and agile development.
  3. Facilitating hybrid cloud and multi-cloud approaches: Because of their portability, containers enable developers to utilize hybrid cloud and/or multi-cloud approaches. Containers allow applications to move easily between environments—from on-premises to the cloud or between different clouds.
  4. Accelerating cloud migration or cloud-native development: Existing applications can be refactored using containers to make them easier to migrate to modern cloud environments. Containers also enable cloud-native development and deployment.

Container Tools

The two most widely recognized container tools are Docker and Kubernetes. They’re not the only options out there, but in their 2021 survey, Stack Overflow found that nearly 50% out of 76,000+ respondents use Docker and 17% use Kubernetes. But what do they do?

1. What Is Docker?

Container technology has been around for a while in the form of Linux containers or LXC, but the widespread adoption of containers happened only in the past five to seven years with the introduction of Docker.

Docker was launched in 2013 as a project to build single-application LXC containers, introducing several changes to LXC that make containers more portable and flexible to use. It later morphed into its own container runtime environment. At a high level, Docker is a Linux utility that can efficiently create, ship, and run containers.

Docker introduced more standardization to containers than previous technologies and focused on developers, specifically, making it the de facto standard in the developer world for application development.

2. What Is Kubernetes?

As containerization took off, many early adopters found themselves facing a new problem: how to manage a whole bunch of containers. Enter: Kubernetes. Kubernetes is an open-source container orchestrator. It was developed at Google (deploying billions of containers per week is no small task) as a “minimum viable product” version of their original cluster orchestrator, ominously named Borg. Today, it is managed by the Cloud Native Computing Foundation, and it helps automate management of containers including provisioning, load balancing, basic health checks, and scheduling.

Kubernetes allows developers to describe the desired state of a container deployment using YAML files (YAML stands for Yet Another Markup Language, which is yet another winning example of naming things in tech.). The YAML file uses declarative language to tell Kubernetes “this is what this container deployment should look like” and Kubernetes does all the grunt work of creating and maintaining that state.

Containers + Storage: What You Need to Know

Containers are inherently ephemeral or stateless. They get spun up, and they do their thing. When they get spun down, any data that was created while they were running is destroyed with them. But most applications are stateful, and need data to live on even after a given container goes away.

Object storage is inherently scalable. It enables the storage of massive amounts of unstructured data while still maintaining easy data accessibility. For containerized applications that depend on data scalability and accessibility, it’s an ideal solution for keeping stateful data stateful.

There are three essential use cases where object storage works hand in hand with containerized applications:

  1. Backup and Disaster Recovery: Tools like Docker and Kubernetes enable easy replication of containers, but replication doesn’t replace traditional backup and disaster recovery just as sync services aren’t a good replacement for backing up the data on your laptop, for example. With object storage, you can replicate your entire environment and back it up to the cloud. There’s just one catch: some object storage providers have retention minimums, sometimes up to 90 days. If you’re experimenting and iterating on your container architecture, or if you use CI/CD methods, your environment is constantly changing. With retention minimums, that means you might be paying for previous iterations much longer than you want to. (Shameless plug: Backblaze B2 Cloud Storage is calculated hourly, with no minimum retention requirement.)
  2. Primary Storage: You can use a cloud object storage repository to store your container images, then when you want to deploy them, you can pull them into the compute service of your choice.
  3. Origin Storage: If you’re serving out high volumes of media, or even if you’re just hosting a simple website, object storage can serve as your origin store coupled with a CDN for serving out content globally. For example, CloudSpot, a SaaS platform that serves professional photographers, moved to a Kubernetes cluster environment and connected it to their origin store in Backblaze B2, where they now keep 120+ million files readily accessible for their customers.

Need Object Storage for Your Containerized Application?

Now that you have a handle on what containers are and what they can do, you can make decisions about how to build your applications or structure your internal systems. Whether you’re contemplating moving your application to the cloud, adopting a hybrid or multi-cloud approach, or going completely cloud native, containers can help you get there. And with object storage, you have a data repository that can keep up with your containerized workloads.

Ready to connect your application to scalable, S3-compatible object storage? You can get started today, and the first 10GB are free.

The post What Are Containers? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Drive Failure Over Time: The Bathtub Curve Is Leaking

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/drive-failure-over-time-the-bathtub-curve-is-leaking/

From time to time, we will reference the “bathtub curve” when talking about hard drive and SSD failure rates. This normally includes a reference or link back to a post we did in 2013 which discusses the topic. It’s time for an update. Not because the bathtub curve itself has changed, but because we have nearly seven times the number of drives and eight more years of data than we did in 2013.

In today’s post, we’ll take an updated look at how well hard drive failure rates fit the bathtub curve, and in a few weeks we’ll delve into the specifics for different drive models and even do a little drive life expectancy analysis.

Once Upon a Time, There Was a Bathtub Curve

Here is the classic version of the bathtub curve.

Source: Public domain, https://commons.wikimedia.org/w/index.php?curid=7458336.

The curve is divided into three sections: decreasing failure rate, constant failure rate, and increasing failure rate. Using our 2013 drive stats data, we computed a failure rate and a timeframe for each of the three sections as follows:

2013 Drive Failure Rates

Curve Section Failure Rate Length
Decreasing 5.1% 0 to 18 Months
Constant 1.4% 18 Months to 3 Years
Increasing 11.8% 3 to 4 Years

Furthermore, we computed that at four years, the life expectancy of a hard drive in our system was about 80%, and forecasting that out, at six years, the life expectancy was 50%. In other words, we would expect a hard drive we installed to have a 50% chance of being alive after six years.

Drive Failure and the Bathtub Curve Today

Let’s begin by comparing the drive failure rates over time based on the data available to us in 2013 and the data available to us today in 2021.

Observations and Thoughts

  • Let’s start with an easy one: We have six years worth of data for 2021 versus four years for 2013. We have a wider bathtub. In reality, it is even wider, as we have more than six years of data available to us, but after six years the number of data points (drive failures) is small, less than 10 failures per quarter.
  • The left side of the bathtub, the area of “decreasing failure rate,” is dramatically lower in 2021 than in 2013. In fact, for our 2021 curve, there is almost no left side of the bathtub, making it hard to take a bath, to say the least. We have reported how Seagate breaks in and tests their newly manufactured hard drives before shipping in an effort to lower the failure rates of their drives. Assuming all manufacturers do the same, that may explain some or all of this observation.
  • The right side of the bathtub, the area of “increasing failure rate,” moves right in 2021. Obviously, drives installed after 2013 are not failing as often in years three and four, or most of year five for that matter. We think this may have something to do with the aftermath of the Thailand drive crisis back in 2011. Drives got expensive, and quality (in the form of reduced warranty periods) went down. In addition, there was a fair amount of manufacturer consolidation as well.
  • It is interesting that for year two, the two curves, 2013 and 2021, line up very well. We think this is so because there really is a period in the middle in which the drives just work. It was just shorter in 2013 due to the factors noted above.

The Life Expectancy of Drives Today

As noted earlier, back in 2013, the 80% of the drives installed would be expected to survive four years. That fell to 50% after six years. In 2021, the life expectancy of a hard drive being alive at six years is 88%. That’s a substantial increase, but it basically comes down to the fact that hard drives are failing less in our system. We think it is a combination of better drives, better storage servers, and better practices by our data center teams.

What’s Next

For 2021, our bathtub curve looks more like a hockey stick, although saying, “When you review our hockey stick curve…” doesn’t sound quite right. We’ll try to figure out something by our next post on the topic. One thing we also want to do in that next post is to break down the drive failure data by model and see if the different drive models follow the bathtub curve, the hockey stick curve, or some other unnamed curve. We’ll also chart out the life expectancy curves for all the drives as a whole and by drive model as well.

Well, time to get back to the data, our next Drive Stats report is coming up soon.

The post Drive Failure Over Time: The Bathtub Curve Is Leaking appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Add Object Lock to Your IT Security Policy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/how-to-add-object-lock-to-your-it-security-policy/

Object Lock is a powerful backup protection tool that makes data immutable. It allows you to store objects using a Write Once, Read Many (WORM) model, meaning after it’s written, data cannot be modified or deleted for a defined period of time. Any attempts to manipulate, copy, encrypt, change, or delete the file will fail during that time. The files may be accessed, but no one can change them, including the file owner or whoever set the Object Lock.

This makes Object Lock a great tool as part of a robust cybersecurity program. However, when Object Lock is used inconsistently, it can consume unnecessary storage resources. For example, if you set a retention period of one year, but you don’t end up needing to keep the data that long, you’re out of luck. Once the file is locked, it cannot be deleted. That’s why it’s important to develop a consistent approach.

In this post, we’ll outline five different use cases for Object Lock and explain how to add Object Lock to your IT security policies to ensure your company gets all the protection Object Lock offers while managing your storage footprint.

When to Use Object Lock: Five Use Cases

There are at least five situations where Object Lock is helpful. Keep in mind that these requirements may change over time. Compliance requirements, for example, might be relatively simple today. However, those requirements may become more complex if your company onboards customers in a highly regulated sector like finance or health care.

1. Reducing Cybersecurity Risk

Cybersecurity threats are increasing. In 2015, there were approximately 1,000 ransomware attacks per day, but this figure has increased to more than 4,000 per day since 2016, according to the U.S. government. To be clear, using Object Lock does not prevent a ransomware attack. Instead, data protected by Object Lock is immutable. In the event of a ransomware attack, it cannot be altered by malicious software. Ultimately, your organization may be able to recover from a cyber attack more quickly by restoring data protected by Object Lock.

2. Meet Compliance Requirements With Object Lock

Some industries have extensive record retention requirements. Preserving digital records with Object Lock is one way to fulfill those expectations. Several regulatory and legal requirements direct companies to retain records for a certain period of time.

  • Banks insured by FDIC generally must retain account records for five years after the “account is closed or becomes dormant.” Beyond FDIC, there are many other state and federal compliance requirements on the financial industry. Preserving data with Object Lock can be helpful in these situations.
  • In the health care field, requirements vary across the country. The American Health Information Management Association points out that retaining health records for up to 10 years or longer may be needed.
  • You may also have to retain data for tax purposes. The IRS generally suggests keeping tax-related records for up to seven years. However, there are nuances to these requirements (i.e., shorter retention periods in some cases and potentially longer retention periods for property records).

3. Fulfilling a Legal Hold

When a company is sued, preserving all relevant records is wise. An article published by the American Bar Association points out that failing to preserve records may “undermine a litigant’s claims and defenses.” Given that many companies keep many (if not all) of their records in digital form, preserving digital records is essential. In this situation, using Object Lock to preserve records may be beneficial.

4. Meeting a Retention Period for Other Needs

Higher-risk business activities may benefit from preserving data with Object Lock. For example, an engineering company working on designing a bridge might use Object Lock to maintain records during the project. In software development, new versions of software may become unstable. Restoring to a previous version of the software, preserved from tampering or accidental deletion with Object Lock, can be valuable.

5. Replacing an LTO Tape System

In an LTO tape system, data immutability is conferred by a physical “air gap,” meaning there’s a literal gap of air between production data and backups stored on tape—the two are not physically connected in any way. Object Lock creates a virtual air gap, replacing the need for expensive physical infrastructure.

Two Factor Verification via Auth Apps

How to Add Object Lock to Your Security Policy

No matter the reason for implementing Object Lock, consistent usage is key. To encourage consistent usage, consider adding Object Lock as an option in your company’s security policy. Use the following tips as a guide on when and how to use Object Lock.

  • Set Up Object Lock Governance: Assign responsibility to a single manager in IT or IT security to develop Object Lock governance policies. Then, periodically review Object Lock governance and update retention policies as necessary as the security landscape evolves.
  • Evaluate the Application of Object Lock in Your Context: Are you subject to retention regulations? Do you have certain data you need to keep for an extended period of time? Take an inventory of your data and any specific retention considerations you may want to keep in mind when implementing Object Lock.
  • Document Object Lock Requirements: There are different ways to explain and communicate Object Lock guidelines. If your IT security policy focuses on high-level principles, consider adding Object Lock to a data management procedure instead.
  • Add Object Lock to Your Policy for Cloud Tools: Review your cloud solutions to see which providers support Object Lock. Only a few storage platforms currently offer the feature, but if your provider is one of them, you can enable Object Lock and specify the length of time an object should be locked in the storage provider’s user interface, via your backup software, or by using API calls.
  • Use Change Management to Promote the Change to the Policy Internally: Writing Object Lock into your policy is a good step, but it is not the end of the process. You also need to communicate the change internally and ensure employees who need to use Object Lock are trained on the Object Lock policies and procedures.
  • Testing and Monitoring: Periodically review if Object Lock is being used per the established policies and if data is being properly protected as outlined. As a starting point, review Object Lock usage quarterly and spot check data to ensure it’s locked.

Adding Object Lock to Your Security Tool Kit

Object Lock is a helpful way to protect data from being changed. It can help your organization meet records retention requirements and make it easier to recover from a cyber attack. It’s one tool that can strengthen a robust IT security practice, but you first need a well-developed backup program to keep your company operating in the event of a disruption. To find out more about emerging backup strategies, check out our explainer, “What’s the Diff: 3-2-1 vs. 4-3-2-1-0 vs. 4-3-2” to keep your valuable company data safe. And, for a comprehensive ransomware prevention playbook, check out our Complete Guide to Ransomware.

The post How to Add Object Lock to Your IT Security Policy appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Crash CORS: A Guide for Using CORS

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/crash-cors-a-guide-for-using-cors/

The dreaded CORS error is the bane of many a web developer’s day-to-day life. Even for experts, it can be eternally frustrating. Today, we’re digging into CORS, starting with the basics and working up to sharing specific code samples to help you enable CORS for your web development project. What is CORS? Why do you need it? And how can you enable it to save time and resources?

We’ll answer those questions so you can put the CORS bane behind you.

What Is CORS?

CORS stands for cross-origin resource sharing. To define CORS, we first need to explain its counterpoint—the same-origin policy, or SOP. The SOP is a policy that all modern web browsers use for security purposes, and it dictates that a web address with a given origin can only request data from the same origin. CORS is a mechanism that allows you to bypass the SOP so you can request data from websites with different origins.

Let’s break that down piece by piece, then we’ll get into why you’d want to bypass the same-origin policy in the first place.

What Is an Origin?

All websites have an origin, and it is defined by the protocol, domain, and port of its URL.

You’re probably familiar with the working parts of a URL, but they each have a different function. The protocol, also known as the scheme, identifies the method for exchanging data. Typical protocols are http, https, or mailto. The domain, also known as the hostname, is a specific webpage’s unique identifier. You may not be as familiar with the port as it’s not normally visible in a typical web address. Just like a port on the water, it’s the connection point where information comes in and out of a server. Different port numbers specify the types of information the port handles.

When you understand what an origin is, the “cross-origin” part of CORS makes a bit more sense. It simply means web addresses with different origins. In web addresses with the same origin, the protocol, domain, and port all match.

What Is the Same-origin Policy?

The same-origin policy was developed and implemented as a security measure against a specific website vulnerability that was discovered and exploited in the 2000s. Before the same-origin policy was in place, bad actors could use cookies stored in people’s browsers to make requests to other websites illicitly. This is known as cross-site request forgery, or CSRF, pronounced “sea surf.” It’s also known as “session riding.” Tubular.

Let’s say you log in to Netflix on your laptop to add Ridley Scott’s 1982 classic, “Blade Runner” to your queue, as one does. You click “Remember Me” so you don’t have to log in every time, and your browser keeps your credentials stored in a cookie so that the Netflix site knows you are logged in no matter where you navigate within their site.

Afterwards, you’re bored, so you fall down an internet rabbit hole wondering why “Blade Runner” is called “Blade Runner” when there are few blades and little running. You end up on a site about samurai swords that happens to be malicious—it has a script in its code that uses your authentication credentials stored in that cookie to make a request to Netflix that can change your address and add a bunch of DVDs to your queue (also, it’s 2006, and this actually happened). You’ve become a victim of cross-site request forgery.

To thwart this threat, browsers enabled the same-origin policy to prohibit requests from one origin to another.

Why Do You Need CORS?

While the same-origin policy helped stop bad actors from nefariously accessing websites, it posed a problem—sometimes you need or want assets and data from different origins. This is where the “resource sharing” part of cross-origin resource sharing comes in.

CORS allows you to set rules governing which origins are allowed to bypass the same-origin policy so you can share resources from those origins.

For example, you might host your website’s front end at www.catblaze.com, but you host your back-end API at api.catblaze.com. Or, you might need to display a bunch of cat videos stored in Backblaze B2 Cloud Storage on your website, www.catblaze.com (more on that below).

Do I Need CORS?

Let’s say you have a website, and you dabble in some coding. You’re probably thinking, “I can already use images and stuff from other websites. Do I need CORS?” And you’re right to ask. Most browsers allow simple http requests like get, head, and post without requiring CORS rules to be set in advance. Embedding images from other sites, for example, typically requires a get request to grab that data from a different origin. If that’s all you need, you’re good to go. You can use simple requests to embed images and other data from other websites without worrying about CORS.

But some assets, like fonts or iframes, might not work—then, you can use CORS—but if you’re a casual user, you can probably stop here.

Coding Explainer: What Is an http Request?

An http request is the way you, the client, use code to talk to a server. A complete http request includes a request line, request headers, and a message body if necessary. The request line specifies the method of the request. There are generally eight types:

  • get: “I want something from the server.”
  • head: “I want something from the server, but only give me the headers, not the content.”
  • post: “I want to send something to the server.”
  • put: “I want to send something to the server that replaces something else.”
  • delete: …self-explanatory.
  • patch: “I want to change something on the server.”
  • options: “Is this request going to go through?” (This one is important for CORS!)
  • trace: “Show me what you just did.” Useful for debugging.

After the method, the request line also specifies the path of the URL the method applies to as well as the http version.

The request headers communicate some specifics around how you want to receive the information. There are usually a whole bunch of these. (Wikipedia has a good list.) They’re typically formatted thusly: name:value. For example:

accept:text/plain means you want the response to be in text format.

Finally, the message body contains anything you might want to send. For example, if you use the method post, the message body would contain what you want to post.

Do I Need CORS With Cloud Storage?

People use cloud storage for all manner of data storage purposes, and most do not need to use CORS to access resources stored in a cloud instance. For example, you can make API calls to Backblaze B2 from any computer to use the resources you have stored in your storage buckets. If you’re running a mobile application and transferring data back and forth to Backblaze B2, for instance, you don’t need CORS. The mobile application doesn’t rely on a web browser.

You only need CORS if you’re specifically running code inside of a web browser and you need to make API calls from the browser to Backblaze B2. For example, if you’re using an in-browser video player and want to play videos stored in Backblaze B2.

Fortunately, if you do need CORS, Backblaze B2 allows you to configure CORS to your exact specifications while other cloud providers may have completely open CORS policies. Why is that important? An open CORS policy makes you vulnerable to CSRF attacks. To continue with the video example, let’s say you’re storing a bunch of videos that you want to make available on your website. If they’re stored with a cloud provider that has an open CORS policy, you have two choices—open or closed. You pick open so that your website visitors can call up those videos on demand, but that leaves you vulnerable to a CSRF that could allow a bad actor to download your videos. With Backblaze, you can specify the exact CORS rules you need.

If you are using Backblaze B2 to store data that will be displayed in a browser, or you’re just curious, read on to learn more about using CORS. CORS has saved developers lots of time and money by reducing maintenance effort and code complexity.

How Does CORS Work?

Unlike simple get, head, and post requests, some types of requests can alter the origin’s data. These include requests like delete, put, and patch. Any type of request that could alter the origin’s data will trigger CORS, as will simple requests that have non-standard http headers or requests in certain programming languages like AJAX. When CORS is triggered, the browser sends what’s called a preflight request to see if the CORS rules allow the request.

What Is a Preflight Request?

A preflight request, also known as an options request, asks the server if it’s okay to make the CORS request. If the preflight request comes back successfully, then the browser will complete the actual request. Few other systems in computing do this by default, so it’s important to understand when using CORS.

A preflight request has the following headers:

  • origin: Identifies the origin from which the CORS request originates.
  • access-control-request-method: Identifies the method of the CORS request.
  • access-control-request-headers: Lists the headers that will be included in the CORS request.

The web server then responds with the following headers:

  • access-control-allow-origin: Confirms the origin is allowed.
  • access-control-allow-method: Confirms the methods are allowed.
  • access-control-allow-headers: Confirms the headers are allowed.

The values that follow these headers must match the values specified in the preflight request. If so, the browser will permit the actual CORS request to come through.

Setting CORS Up: An Example

To provide an example for setting CORS up, we’ll use Backblaze B2. By default, the Backblaze B2 servers will say “no” to preflight requests. Adding CORS rules to your bucket tells Backblaze B2 which preflight requests to approve. You can enable CORS in the Backblaze B2 UI if you only need to allow one, specific origin or if you want to be able to share the bucket with all origins.

Click the CORS rules link to configure CORS.
In the CORS rules pop-up, you can choose how you want to configure CORS rules.

If you need more specificity than that, you can select the option for custom rules and use the Backblaze B2 command line tool.

When a CORS preflight or cross-origin download is requested, Backblaze B2 evaluates the CORS rules on the file’s bucket. Rules may be set at the time you create the bucket with b2_create_bucket or updated on an existing bucket using b2_update_bucket.

CORS rules only affect Backblaze B2 operations in their “allowedOperations” list. Every rule must specify at least one in their allowedOperations.

CORS Rule Structure

Each CORS rule may have the following parameters:


  • corsRuleName: A name that humans can recognize to identify the rule.
  • allowedOrigins: A list of the origins you want to allow.
  • allowedOperations: A list that specifies the operations you want to allow, including:
    • B2 Native API Operations:
    • B2_download_file_by_name
    • B2_download_file_by_id
    • B2_upload_file
    • B2_upload_part
    • S3 Compatible Operations:
    • S3_delete
    • S3_get
    • S3_head
    • S3_post
    • S3_put


  • allowedHeaders: A list of headers that are allowed in a preflight request’s Access-Control-Request-Headers value.
  • exposeHeaders: A list of headers that may be exposed to an application inside the client.
  • maxAgeSeconds: The maximum number of seconds that a browser can cache the response to a preflight request.

The following sample configuration allows downloads, including range requests, from any https origin and will tell browsers that it’s okay to expose the ‘x-bz-content-sha1’ header to the web page.

"corsRuleName": "downloadFromAnyOrigin",
"allowedOrigins": [
"allowedHeaders": ["range"],
"allowedOperations": [
"exposeHeaders": ["x-bz-content-sha1"],
"maxAgeSeconds": 3600

You may add up to 100 CORS rules to each of your buckets. Backblaze B2 uses the first rule that matches the request. A CORS preflight request matches a rule if the origin header matches one of the rule’s allowedOrigins, if the operation is in the rule’s allowedOperations, and if every value in the Access-Control-Request-Headers is in the rule’s allowedHeaders.

Using CORS: Examples in Action

Using your browser’s console, you can copy and paste the following examples to see CORS requests succeed or fail. As a handy guide for you, the text files we’ll be requesting include the bucket configuration of the Backblaze B2 buckets we’re calling.

In the first example, we’ll make a request to get the text file bucket_info.txt from a bucket named “cors-allow-none” that does not allow CORS requests:

method: 'GET'
).then(resp => resp.text()).then(console.log)

As you can see, this request returns a CORS error:

Next, we’ll try the same request on a bucket named “cors-allow-all” that allows CORS with any origin, but only specific headers.

method: 'GET'
).then(resp => resp.text()).then(console.log)

When you run the code, you will see some text output to the console indicating that, indeed, the bucket allows CORS with all origins, but specific headers:

We didn’t include any headers in our request, so the request was successful and the text file we wanted—bucket_info.txt—appears below the text output in the console. As you can see in the text output, the bucket is configured with an asterisk “*,” also known as a “wildcard,” to allow all origins (more on that later).

Next, we’ll try the same thing on the bucket that allows CORS with all origins, but this time triggers a pre-flight check for a header that is not allowed:

method: 'GET',
headers: { 'X-Fake-Header': 'breaking-cors-for-fun' }
).then(resp => resp.text()).then(console.log)

Our bucket is configured to only allow the headers authorization and range, but we’ve included the header X-Fake-Header with the value breaking-cors-for-fun—definitely not allowed—in the request.

When we run this request, we can see another type of failure:

Below the request, but above the CORS errors, you’ll see that the browser sent an options request. As we mentioned earlier, this is the pre-flight request that asks the server if it’s okay to make the get request. In this case, the pre-flight request failed.

However, this request will succeed if we change our bucket settings to allow all headers.

method: 'GET',
headers: { 'X-Fake-Header': 'breaking-cors-for-fun' }
).then(resp => resp.text()).then(console.log)

Below, you can see the text output “This bucket allows CORS with all origins and any header values.”

The request was successful, and the text file we requested appears in the console.

At this point, it’s important to note that when configuring your own buckets, you should use caution when using the wildcard “*” to allow any origin or header. It’s probably best to avoid the wildcard if possible. It’s okay to allow any origin to access your bucket, but, if so, you’ll probably want to enumerate the headers that matter to avoid CSRF attacks.

For more information on using CORS with Backblaze B2, including some tips on using CORS with the Backblaze S3 Compatible API, check out our documentation here.

Stay on CORS

Ah, another inevitable CORS pun. Did you see it coming? I hope so. In conclusion, here are a few things to remember about CORS and how you can use it to avoid CORS errors in the future:

  • The same-origin policy was developed to make websites less vulnerable to threats, and it prevents requests between websites with different origins.
  • CORS bypasses the same-origin policy so that you can share and use data from different origins.
  • You only need to configure CORS rules for your Backblaze B2 data if you are making calls to Backblaze B2 from code within a web browser.
  • By setting CORS rules, you can specify which origins are allowed to request data from your Backblaze B2 buckets.

Are you using CORS? Do you have any other questions? Let us know in the comments.

The post Crash CORS: A Guide for Using CORS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Bringing Connected Container Management via Cycle

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/bringing-connected-container-management-to-backblaze-cycle/

Containers have changed the way development teams build, deploy, and scale their applications. Unfortunately, the adoption of containers often brings with it ever-growing complexity and fragility that leads to developers spending more time managing their deployment rather than developing applications.

Today’s announcement offers developers a path to easier container orchestration. The Cycle and Backblaze B2 Cloud Storage integration enables companies that utilize containers to seamlessly automate and control stateful data across multiple providers—all from one dashboard.

This partnership empowers developers to:

  • Easily deploy containers without dealing with complex solutions like Kubernetes.
  • Unify their application management, including automating or scheduling backups via an API-connected portal.
  • Choose the microservices and technologies they need without compromising on functionality.

Getting started with Cycle and Backblaze is simple:

  1. Create a Cycle account.
  2. Sign up for B2 Cloud Storage and create a bucket and Application Key.
  3. Associate your Backblaze Application Key with your Cycle hub via the Integrations menu.
  4. Utilize the backup settings within a container config.

For more in-depth information on the integration, check out the documentation.

We recently sat down with Jake Warner, Co-founder and CEO of Cycle, to dig a little deeper into the major cloud orchestration challenges today’s developers face. You can find Jake’s responses below, or you can check out a conversation between Jake and Elton Carneiro, Backblaze Director of Partnerships, on Jake’s podcast.

About Cycle

Cycle set out to become the most developer-friendly container orchestration platform. By simplifying the processes around container and infrastructure deployments, Cycle enables developers to spend more time building and less time managing. With automatic platform updates, standardized deployments, a powerful API, and bottleneck crushing automation—the platform empowers organizations to have the capabilities of an elite DevOps team at one-tenth the cost. Founded in 2015, the company is headquartered in Reno, NV. To learn more, please visit https://cycle.io.

A Q&A With Cycle

Q: Give us a background on Cycle and what problem you are trying to solve? What is the problem with existing infrastructure like Kubernetes and/or similar?

A: We believe that cloud orchestration, and more specifically, container orchestration, is broken. Instead of focusing on the most common use cases, too many companies within this space end up chasing a never-ending list of edge cases and features. This lack of focus yields overly complex, messy, and fragile deployments that cause more stress than peace-of-mind.

Cycle takes a bold approach on container orchestration: focus on the 80%. Most companies don’t require, or need, a majority of the features and capabilities that come with platforms like Kubernetes. By staying hyper-focused on where we spend our time, and prioritizing quality over quantity of features, the Cycle platform has become an incredibly stable and powerful foundation for companies of all sizes.

The goal for Cycle is simple: Be the most developer-friendly container platform for businesses.

We believe that developers should be able to easily utilize containers on their own infrastructure without having to deal with the mundane tasks commonly involved with managing infrastructure and orchestration platforms. Cycle makes that a reality.

Q: What are the major challenges developers face today with container orchestration?

Complexity. Most deployments today require piecing together a wide variety of different tools and technologies for even the most basic deployments. Instead of adopting solutions that empower development teams without increasing bloat, too many organizations are chasing hype and increasing their ever-growing pile of technical debt.

Additionally, most of today’s platforms require constant hand-holding. While there are many tools that help reduce the amount of time to get a first deployment online, we see a major drop-off when it comes to “day two operations.” What happens when there’s a new update, or security patch? How often are these updates released? How many developers/DevOps personnel are needed to apply these updates? How much downtime should be expected?

With Cycle, we reduce complexity by providing a single turnkey solution for developers—no extra tools required. Additionally, our platform is built around the idea that everything should be capable of automatically updating. On average, we deploy platform updates to our customer infrastructure once every 10-14 days.

Q: Before announcing our integration, had you implemented Backblaze internally? Could you expand on that?

Absolutely! In the early days of Cycle, we made the decision to standardize and flatten all container images into raw OCI images. This was our way of hedging against different technologies going through hype waves. At the time, Docker was the “top dog” in the container space but there was also CoreOS and a number of others.

In an effort to control as much of the vertical stack as possible, we decided that, beyond flattening images, we should also store the resulting images ourselves. This way, if Docker Hub or another container registry unexpectedly changed their APIs or pricing, our platform and users would be insulated from those changes. As you can see, we put a lot of thought into limiting external variables.

Given the above, we knew that having an infinitely scalable storage solution was critical for the platform. After testing a number of providers, Backblaze B2 was the perfect fit for our needs.

Fast-forward to today, where all base images are stored on Backblaze.

Q: As alluded to above, you’re currently building a customer-facing integration. What’s the new feature? Have customers been asking for this?

We’re excited to announce that Cycle now supports automatic backups for stateful containers. A number of customers have been requesting this feature for a while and we’re thrilled to finally release it.

At Cycle, data ownership is very important to us—our platform was built specifically to empower developers while ensuring they, and their employers, still retain full ownership and control of their data. This automated backups feature is no different. By associating a Backblaze B2 API Key with Cycle, organizations can maintain ownership of their backups.

Q: What sparked the decision to partner and integrate with Backblaze specifically?

While there are a number of reasons this partnership makes a ton of sense, narrowing it down to a top three would be:

Performance: As we were testing different Object Storage providers, Backblaze B2 routinely was one of the most reliable while also offering solid upload and download speeds. We also liked that Backblaze B2 wasn’t overly bloated with features—it had exactly what we needed.

Cost: As Cycle continues to grow, and our storage needs increase, it’s incredibly important to keep costs in check. Beyond predictable and transparent pricing, the base cost per terabyte of data is impressive.

Team: Working with the Backblaze team has been incredible. From our early conversations with Nilay, Backblaze’s VP of Sales, to the expanded conversations with much of the Backblaze team today, everyone has been super eager to help.

Q: Backblaze and Cycle share a similar vision in making life easier for developers. It goes beyond just dollars saved, though that is a bonus, but what is it about “simple” that is so important? Infrastructure of the future?

Good question! There are a number of different ways to answer this, but for the sake of not turning this into an essay, let’s focus purely on what we, the Cycle team, refer to as “think long term.”

Anyone can make a process complex, but it takes a truly focused effort to keep things simple. Rules and guidelines are needed. You need to be able to say “No” to certain feature requests and customer demands. To be able to provide a polished and clean experience, you have to be purposeful in what you’re building. Far too often, companies will chase short term wins while sacrificing long-term gains, but true innovation takes time. In a world where most tech companies are built off venture capital, long-term gambles and innovations are deprioritized.

From the way both Cycle and Backblaze have been funded, to a number of other aspects, we’ve both positioned our companies to take those long term risks and focus on simplifying otherwise complex processes. It’s part of our culture, it’s who we are as teams and organizations.

As we talk about developers, we see a common pattern. Developers always love testing new technologies, they enjoy getting into the weeds and tweaking all of the variables. But, as time goes on, developers shift away from “I want to control every variable” into more of a “I just want something that works and gets out of the way.” This is where Cycle and Backblaze both excel.

Q: What can we look forward to as this partnership matures? Anything exciting on the horizon for Cycle that you’d like to share?

We’re really looking forward to expanding our partnership with Backblaze as the Cycle platform continues to grow. Combining powerful container orchestration with a seamless object storage solution can empower developers to more easily build the next generation of products and services.

While we now host our base container images and customer backups on Backblaze, this is still just the start. We have a number of really exciting features launching in 2022 that’ll further strengthen the partnership between our companies and the value we can provide to developers around the world.

Interested in learning more about the developer solutions available with Backblaze B2 Cloud Storage? Join us, free, for Developer Day on October 21 for announcements, tech talks, lessons, SWAG, and more to help you understand how B2 Cloud Storage can work for you. Register today.

The post Bringing Connected Container Management via Cycle appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Using Machine Learning to Predict Hard Drive Failures

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/using-machine-learning-to-predict-hard-drive-failures/

When we first published our Drive Stats data back in February 2015, we did it because it seemed like the Backblaze thing to do. We had previously open-sourced our Storage Pod designs, so publishing the Drive Stats data made sense for two reasons. The first was transparency. We were publishing our Drive Stats reports based on that data and we wanted people to trust the accuracy of those reports.

The second reason was that it gave people, many of whom are much more clever than us, the ability to play with the data, and that’s what they did. Over the years, the Drive Stats data has been used in projects ranging from training sets for college engineering and statistics students to being a source for scientific and academic papers and articles. In fact, using Google Scholar, you will find 105 papers and articles since 2018 where the Drive Stats data was cited as a source.

One of those papers is “Interpretable predictive maintenance for hard drives” by Maxime Amram et al., which describes methodology for predicting hard drive failure using various machine learning techniques. At the center of the research team’s analysis is the Drive Stats data, but before we dive into that paper, let’s take a minute or two to understand the data itself.

Join us for a live webinar on October 14th at 10 a.m. PST to hear directly from Daisy Zhuo, PhD, co-founding partner of Interpretable AI, and Drive Stats author, Andy Klein, as they discuss the findings from the research paper. Register today.

The Hard Drive Stats Data

Each day, we collect operational data from all of the drives in our data centers worldwide. There is one record per drive per day. As of September 30, 2021, there were over 191,000 drives reporting each day. In total, there are over 266 million records going back to April 2013, and each data record consists of basic drive information (model, serial number, etc.) and any SMART attributes reported by a given drive. There are 255 pairs of SMART attributes possible for each drive model, but only 20 to 40 are reported by any given drive model.

Example of Drive Stats data collected for each drive.

The data reported comes from each drive itself and the only fields we’ve added manually are the date and the “failure” status: zero for operational and one if the drive has failed and been removed from service.

Predicting Drive Failure

Using the SMART data to predict drive failure is not new. Drive manufacturers have been trying to use various drive-reported statistics to identify pending failure since the early 1990s. The challenge is multifold, as you need to be able to:

  • Determine a SMART attribute or group of attributes which predict failure is imminent in a given drive model.
  • Keep the false positive rate to a minimum.
  • Be able to test the hypothesis on a large enough data set with both failed and operational drives.

Even if you can find a combination of SMART attributes which fit the bill, you are faced with two other realities:

  • By the time you determine and test your hypothesis, is the drive model being tested still being manufactured?
  • Is the prediction really useful? For example, what is the value of an attribute that is 95% accurate at predicting failure, but only occurs in 1% of all failures.

Machine Learning and Drive Failure

Before we start, we highly encourage you to read the paper, “Interpretable predictive maintenance for hard drives,” noted earlier. This will provide you with the context for what we will cover here.

The paper sets out to examine whether the application of algorithms for interpretable machine learning would provide meaningful insights about short-term and long-term drive health and accurately predict drive failure in the process.

Analysis of Backblaze and Google Methodologies

Backblaze (Beach/2014 and Klein/2016) and Google (Pinheiro et al./2007) analyzed SMART data they collected to determine drive failure in a population of hard drives. Each identified similar SMART attributes which correlated to some degree to drive failure:

  • 5 (Reallocated Sectors Count).
  • 187 (Reported Uncorrectable Errors).
  • 188 (Command Timeout)—Backblaze only.
  • 197 (Current Pending Sectors Count).
  • 198 (Offline Uncorrectable Sectors Count).

Given that both were univariate analyses i.e., only considering correlation between drive failure and a single metric at a time, the results, while useful, left open the opportunity for validation using more advanced methods. That’s where machine learning comes in.

Predicting Long-term Drive Health

For their analysis in the paper, Interpretable AI focused on the Seagate 12TB drive, model: ST12000NM0007, for the period ending Q1 2020, and analyzed daily records of over 35,000 drives.

An overview of the methodology used by Interpretable AI to predict long-term drive health is as follows:

  • Compute the remaining useful life of each drive until failure for each drive, each day.
  • Use that data combined with the daily SMART data to train an Optimal Survival Tree that models how the remaining life of each drive is affected by the SMART values.

Each SMART attribute is represented by one node in the Optimal Survival Tree. Each node splits in two leaf nodes as determined by the analysis. Hard drives are recursively routed down the tree based on their SMART values, with the node value and hierarchy adjusting for each drive that passes through. The model learns the best values for each node as more drives pass through until all the drives are divided into collections which best represent their collective data. Below is the Optimal Decision Tree for predicting long-term health (“Interpretable predictive maintenance for hard drives,” Figure 3).

Optimal Survival Tree for predicting long-term health.

At the top of the tree is SMART 5 (raw value), which is deemed the most important SMART value to determine drive failure in this case, but it is not alone. Traveling down the branches of the tree, other SMART attributes become part of a given branch, adding or subtracting their value towards predicting drive health along the way. The analysis leads to some interesting results that univariate analysis cannot see:

  • Poor Drive Health: The path to Node 11 is the set of conditions (SMART attribute values) that if present, predicts the failure of the drive within 50 days.
  • Healthy Drives: The path to Node 18 is the set of conditions (SMART attribute values) that predicts that at least half of the drives that meet those conditions will not fail within two years.

Predicting Short-term Drive Health

The same methodology used on predicting long-term drive health is used for predicting short-term drive health as well. The difference is that for the short-term use case, only data for a 90-day period is used. In this case, this is the data from Q1 2020 for the same Seagate drives analyzed in the previous section. The goal is to determine the ability to predict hard drives failures 30, 60, and 90 days out.

The paper also discussed a second methodology which treats the analysis as a classification problem that occurs in a specific time window. The results are similar to the Optimal Survival Tree methodology for the period and as such, that methodology is not discussed here. Please refer to the paper for additional details.

Applying the Optimal Survival Tree methodology to the Q1 2020 data, we find that while SMART 5 is still the primary factor, the contribution of other SMART attributes has changed versus the long-term health process. For example, SMART 187 is more important, while SMART 197 has diminished in value so much that it is not considered important in assessing the short-term health of the drives. Below is the Optimal Decision Tree for predicting short-term health (“Interpretable predictive maintenance for hard drives,” Figure 6).

Optimal Survival Tree for predicting short-term health.

Traveling down the branches of the tree, we can once again see some interesting results that univariate analysis cannot see:

  • Poor Drive Health: Nodes 21 and 24 identify a set of conditions (SMART attribute values) that, if present, predict almost certain failure within 90 days.
  • Healthy Drives: Nodes 12 and 15 identify a set of conditions (SMART attribute values) that, if present, identify healthy drives with little chance of failure within 90 days.

How Much Data Do You Need?

One of the challenges we noted earlier with predicting drive failure was the amount of data needed to achieve the results. In predicting the long-term health of the drives, the Interpretable AI researchers first used three years of drive data. Once they determined their results, they reduced the data used to one year, 557,936 observations, and then randomly resampled 50,000 observations from that initial data set to train their model with the remainder used for testing.

The resulting Optimized Survival Tree was similar to that of the long-term health survival tree in that they were still able to identify nodes where accelerated failure was evident.

Learn More

To learn more about how Optimized Survival Trees were applied to predict hard drive failure, join one of authors, Daisy Zhuo, PhD, co-founding partner of Interpretable AI, as she discusses the findings with Andy Klein of Backblaze. Join us, live, on Thursday, October 14, at 10 a.m. Pacific, or streaming any time afterwards. Sign up today.

Final Thoughts

There have been many other papers attempting to apply machine learning techniques to predicting hard drive failure. The Interpretable paper was the focus of this post as I found their paper to be approachable and transparent, two traits I admire in writing. Those two traits are also defining characteristics of the word, “interpretable,” so there’s that. As for the other papers, a majority are able to predict drive failure at various levels of accuracy and confidence using a wide variety of techniques.

Hopefully it is obvious that predicting drive failure is possible, but will never be perfect. We at Backblaze don’t need it to be. If a drive fails in our environment, there are a multitude of backup strategies in place. We manage failure every day, but tools like those described in the Interpretable paper make our lives a little easier. On the flip side, if you trust your digital life to one hard drive or SSD, forget about predicting drive failure—assume it will happen today and back up your data somewhere, somehow before it does.

Want to read more about HDDs & SSDs, and be the first to know when we share our quarterly Drive Stats reports? Subscribe to the Backblaze Drive Stats newsletter today.


Interpretable predictive maintenance for hard drives,” Maxime Amram et al., ©2021 Interpretable AI LLC. This is an open access article under the CC-BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/).

The post Using Machine Learning to Predict Hard Drive Failures appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Analyst Firm Validates B2 Cloud Storage Platform’s Time and Budget Savings

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/analyst-firm-validates-b2-cloud-storage-platforms-time-and-budget-savings/

92% time savings. 71% storage cost savings. 3.7 times lower total cost than the competition.

These are just some of the findings Enterprise Strategy Group (ESG) reported in a proprietary, economic validation analysis of Backblaze B2 Cloud Storage. To develop these findings, the ESG analysts did their proverbial research. They talked to customers. They validated use cases. They used our product and verified the accuracy of our listed pricing and cost calculator. And then, they took those results along with the knowledge they’ve gathered over 20 years of experience to quantify the bonafide benefits that organizations can expect by using the Backblaze B2 Cloud Storage platform.

Their findings are now available to the public in the new ESG Economic Validation report, “Analyzing the Economic Benefits of the Backblaze B2 Cloud Storage Platform.”

ESG’s models predicted that the Backblaze B2 Cloud Storage platform will give users an expected total cost of cloud storage that is 3.7 times lower than alternative cloud storage providers, including:

Predicted savings of up to:

  • 92% less time to manage data.
  • 72% lower cost of storage.
  • 91% lower cost of downloads and transactions.
  • 89% lower cost of migration.

If you don’t have time to read the full report, the infographic below illustrates the key findings. Click on the image to see it in full size.

The Economic Value of Backblaze B2 Cloud Storage

If you want to share this infographic on your site, copy the code below and paste into a Custom HTML block. 

<div><div><strong>Analyst Firm Validates B2 Cloud Storage Platform’s Time and Budget Savings</strong></div><a href="https://www.backblaze.com/blog/analyst-firm-validates-b2-cloud-storage-platforms-time-and-budget-savings/"><img src="https://www.backblaze.com/blog/wp-content/uploads/2021/10/ESG-Infographic-scaled.jpg" border="0" alt="The Economic Value of Backblaze B2 Cloud Storage" title="The Economic Value of Backblaze B2 Cloud Storage" /></a></div>

The findings cut through the marketing noise to announce that by choosing Backblaze B2, customers benefit in both time and cost savings, and you don’t have to take it from us.

If that sounds like something you’d appreciate from a cloud partner, getting started couldn’t be easier. Sign up today to begin using Backblaze B2—your first 10GB are free.

If you’re already a B2 Cloud Storage customer—first, thank you! You can feel even more confident in your choice to work with Backblaze. Have a colleague or contact who you think would benefit from working with Backblaze, too? Feel free to share the report with your network.

The post Analyst Firm Validates B2 Cloud Storage Platform’s Time and Budget Savings appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: Backblaze S3 Compatible API vs. B2 Native API

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/whats-the-diff-backblaze-s3-compatible-api-vs-b2-native-api/

Backblaze B2 Cloud Storage enables thousands of developers—from video streaming applications like Kanopy to gaming platforms like Nodecraft—to easily store and use data in the cloud. Those files are always available for download either through a browser-compatible URL or APIs.

Backblaze supports two different suites of APIs—the Backblaze B2 Native API and the Backblaze S3 Compatible API. Sometimes, folks come to our platform knowing exactly which API they need to use, but the differences between the two are not always immediately apparent. If you’re not sure which API is best for you and your project, we’re explaining the difference today.

B2 Native API vs. S3 Compatible API: What’s the Diff?

Put simply, an application programming interface, or API, is a set of protocols that lets one application or service talk to another. They typically include a list of operations or calls developers can use to interact with said application (inputs) and a description of what happens when those calls are used (outputs). Both the B2 Native and S3 Compatible APIs handle the same basic functions:

  • Authentication: Providing account/bucket/file access.
  • Bucket Management: Creating and managing the buckets that hold files.
  • Upload: Sending files to the cloud.
  • Download: Retrieving files from the cloud.
  • List: Data checking/selection/comparison.

The main difference between the two APIs is that they use different syntax for the various calls. We’ll dig into other key differences in more detail below.

The Backblaze B2 Native API

The B2 Native API is Backblaze’s custom API that enables you to interact with Backblaze B2 Cloud Storage. We’ve written in detail about why we developed our own custom API instead of just implementing an S3-compatible interface from the beginning. In a nutshell, we developed it so our customers could easily interact with our cloud while enabling Backblaze to offer cloud storage at a quarter of the price of S3.

To get started, simply create a Backblaze account and enable Backblaze B2. You’ll then get access to your Application Key and Application Key ID. These let you call the B2 Native API.

The Backblaze S3 Compatible API

Over the years since we launched Backblaze B2 and the B2 Native API, S3 compatibility was one of our most requested features. When S3 launched in 2006, it solved a persistent problem for developers—provisioning and maintaining storage hardware. Prior to S3, developers had to estimate how much storage hardware they would need for their applications very accurately or risk crashes from having too little or, on the flip side, paying too much as a result of over-provisioning storage for their needs. S3 gave them unlimited, scalable storage that eliminated the need to provision and buy hardware. For developers, the service was a game-changer, and in the years that followed, the S3 API essentially became industry standard for object storage.

In those years as well, other brands (That’s us!) entered the market. AWS was no longer the only game in town. Many customers wanted to move from Amazon S3 to Backblaze B2, but didn’t want to rewrite code that already worked with the S3 API.

The Backblaze S3 Compatible API does the same thing as the B2 Native API—it allows you to interact with B2 Cloud Storage—but it follows the S3 syntax. With the S3 Compatible API, if your application is already written to use the S3 API, B2 Cloud Storage will just work, with minimal code changes on your end. The launch of the S3 Compatible API provides developers with a number of benefits:

  • You don’t have to learn a new API.
  • You can use your existing tools that are written to the S3 API.
  • Performance will be just as good and you’ll get all the benefits of B2 Cloud Storage.

To get started, create a Backblaze account and head to the App Keys page. The Master Application Key will not be S3 compatible, so you’ll want to create a new key and key ID by clicking the “Add a New Application Key” button. Your new key and key ID will work with both the B2 Native API and S3 Compatible API. Just plug this information into your application to connect it to Backblaze B2.

Find the App Keys page in your Backblaze account to create your S3-compatible key and key ID.

If your existing tools are written to the S3 API—for example, tools like Cohesity, rclone, and Veeam—they’ll work automatically once you enter this information. Additionally, many tools—like Arq, Synology, and MSP360—were already integrated with Backblaze B2 via the B2 Native API, but now customers can choose to connect with them via either API suite.

B2 Native and S3 Compatible APIs: How Do They Compare?

Beyond the syntax, there are a few key differences between the B2 Native and Backblaze S3 Compatible APIs, including:

  • Key management.
  • SDKs.
  • Pre-signed URLs.
  • File uploads.

Key Management

Key management is unique to the B2 Native API. The S3 Compatible API does not support key management. With key management, you can create, delete, and list keys using the following calls:

  • b2_create_key
  • b2_delete_key
  • b2_list_keys


Some of our Alliance Partners asked us if we had an SDK they could use. To answer that request, we developed an official Java SDK and Python SDK on GitHub so you can manage and configure your cloud resources via the B2 Native API.

Meanwhile, long-standing, open-sourced SDKs for S3 Compatible APIs are available in any language including Go, PHP, Javascript, Ruby, etc. These SDKs make it easy to integrate your application no matter what language it’s written in.

What Is an SDK?

SDK stands for software development kit. It is a set of software development tools, documentation, libraries, code snippets, and guides that come in one package developers can install. Developers use SDKs to build applications for the specific platform, programming language, or system the SDK serves.

Pre-signed URLs

By default, access to private buckets is restricted to the account owner. If you want to grant access to a specific object in that bucket to anyone else—for example, a user or a different application or service—they need proper authorization. The S3 Compatible API and the B2 Native API handle access to private buckets differently.

The S3 Compatible API handles authorization using pre-signed URLs. It requires the user to calculate a signature (code that says you are who you say you are) before sending the request. Using the URL, a user can either read an object, write an object, or update an existing object. The URL also contains specific parameters like limitations or expiration dates to manage their usage.

The S3 Compatible API supports pre-signed URLs for downloading and uploading. Pre-signed URLs are built into AWS SDKs. They can also be generated in a number of other ways including the AWS CLI and AWS Tools for PowerShell. You can find guides for configuring those tools here. Many integrations, for example, Cyberduck, also offer a simple share functionality that makes providing temporary access possible utilizing the underlying pre-signed URL.

The B2 Native API figures out the signature for you. Instead of a pre-signed URL, the B2 Native API requires an authorization token to be part of the API request itself. The b2_authorize_account request gets the authorization token that you can then use for account-level operations. If you only want to authorize downloads instead of all account-level operations, you can use the request b2_get_download_authorization to generate an authorization token, which can then be used in the URL to authenticate the request.

Uploading Files

With the S3 Compatible API, you upload files to a static URL that never changes. Our servers automatically pick the best route for you that delivers the best possible performance on our backend.

The B2 Native API requires a separate call to get an upload URL. This URL can be used until it goes stale (i.e. returns a 50X error), at which point another upload URL must be requested. In the event of a 50X error, you simply need to retry the request with the new URL. The S3 Compatible API does this for you in the background on our servers, which makes the experience of using the S3 Compatible API smoother.

This difference in the upload process is what enabled Backblaze B2 to offer substantially lower prices at the expense of a little bit more complexity. You can read more about that here.

Try It Out for Yourself

So, which API should you use? In a nutshell, if your app is already written to work with S3, if you’re using tools that are written to S3, or if you’re just unsure, the S3 Compatible API is a good choice. If you’re looking for more control over access and key management, the B2 Native API is the way to go. Either way, now that you understand the differences between the two APIs you can use to work with B2 Cloud Storage, you can align your use cases to the functionality that best suits them and get started with the API that works best for you.

If you’re ready to try out the B2 Native or S3 Compatible APIs for yourself, check out our documentation:

Of course, if you have any questions, fire away in the comments or reach out to our Sales team. And if you’re interested in trying Backblaze, get started today and your first 10GB of storage are free.

The post What’s the Diff: Backblaze S3 Compatible API vs. B2 Native API appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Are SSDs Really More Reliable Than Hard Drives?

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/are-ssds-really-more-reliable-than-hard-drives/

Solid-state drives (SSDs) continue to become more and more a part of the data storage landscape. And while our SSD 101 series has covered topics like upgrading, troubleshooting, and recycling your SSDs, we’d like to test one of the more popular declarations from SSD proponents: that SSDs fail much less often than our old friend, the hard disk drive (HDD). This statement is generally attributed to SSDs having no moving parts and is supported by vendor proclamations and murky mean time between failure (MTBF) computations. All of that is fine for SSD marketing purposes, but for comparing failure rates, we prefer the Drive Stats way: direct comparison. Let’s get started.

What Does Drive Failure Look Like for SSDs and HDDs?

In our quarterly Drive Stats reports, we define hard drive failure as either reactive, meaning the drive is no longer operational, or proactive, meaning we believe that drive failure is imminent. For hard drives, much of the data we use to determine a proactive failure comes from the SMART stats we monitor that are reported by the drive.

SMART, or S.M.A.R.T., stands for Self-monitoring, Analysis, and Reporting Technology and is a monitoring system included in HDDs and SSDs. The primary function of SMART is to report on various indicators related to drive reliability with the intent being to anticipate drive failures. Backblaze records the SMART attributes for every data and boot drive in operation each day.

As with HDDs, we also record and monitor SMART stats for SSD drives. Different SSD models report different SMART stats, with some overlap. To date, we record 31 SMART stats attributes related to SSDs. 25 are listed below.

# Description # Description
1 Read Error Rate 194 Temperature Celsius
5 Reallocated Sectors Count 195 Hardware ECC Recovered
9 Power-on Hours 198 Uncorrectable Sector Count
12 Power Cycle Count 199 UltraDMA CRC Error Count
13 Soft Read Error Rate 201 Soft Read Error Rate
173 SSD Wear Leveling Count 202 Data Address Mark Errors
174 Unexpected Power Loss Count 231 Life Left
177 Wear Range Delta 232 Endurance Remaining
179 Used Reserved Block Count Total 233 Media Wearout Indicator
180 Unused Reserved Block Count Total 235 Good Block Count
181 Program Fail Count Total 241 Total LBAs Written
182 Erase Fail Count 242 Total LBAs Read
192 Unsafe Shutdown Count

For the remaining six (16, 17, 168, 170, 218, and 245), we are unable to find their definitions. Please reach out in the comments if you can shed any light on the missing attributes.
All that said, we are just at the beginning of using SMART stats to proactively fail a SSD. Many of the attributes cited are drive model or vendor dependent. In addition, as you’ll see, there are a limited number of SSD failures. This limits the amount of data we have for research. As we add and monitor more SSDs to our farm, we intend on building out our rules for proactive SSD drive failure. In the meantime, all of the SSDs which have failed to date are reactive failures, that is: They just stopped working.

Comparing Apples to Apples

In the Backblaze data centers, we use both SSDs and HDDs as boot drives in our storage servers. In our case, describing these drives as boot drives is a misnomer as boot drives are also used to store log files for system access, diagnostics, and more. In other words, these boot drives are regularly reading, writing, and deleting files in addition to their named function of booting a server at startup.

In our first storage servers, we used hard drives as boot drives as they were inexpensive and served the purpose. This continued until mid-2018 when we were able to buy 200GB SSDs for about $50, which was our top-end price point for boot drives for each storage server. It was an experiment, but things worked out so well that beginning in mid-2018 we switched to only using SSDs in new storage servers and replaced failed HDD boot drives with SSDs as well.

What we have are two groups of drives, SSDs and HDDs, which perform the same functions, have the same workload, and operate in the same environment over time. So naturally, we decided to compare the failure rates of the SSD and HDD boot drives. Below are the lifetime failure rates for each cohort as of Q2 2021.

SSDs Win… Wait, Not So Fast!

It’s over, SSDs win. It’s time to turn your hard drives into bookends and doorstops and buy SSDs. Although, before you start playing dominoes with your hard drives, there are a couple of things to consider which go beyond the face value of the table above: average age and drive days.

  • The average age of the SSD drives is 14.2 months, and the average age of the HDD drives is 52.4 months.
  • The oldest SSD drives are about 33 months old and the youngest HDD drives are 27 months old.

Basically, the timelines for the average age of the SSDs and HDDs don’t overlap very much. The HDDs are, on average, more than three years older than the SSDs. This places each cohort at very different points in their lifecycle. If you subscribe to the idea that drives fail more often as they get older, you might want to delay your HDD shredding party for just a bit.

By the way, we’ll be publishing a post in a couple of weeks on how well drive failure rates fit the bathtub curve; SPOILER ALERT: old drives fail a lot.

The other factor we listed was drive days, the number of days all the drives in each cohort have been in operation without failing. The wide disparity in drive days causes a big difference in the confidence intervals of the two cohorts as the number of observations (i.e. drive days) varies significantly.

To create a more accurate comparison, we can attempt to control for the average age and drive days in our analysis. To do this, we can take the HDD cohort back in time in our records to see where the average age and drive days are similar to those of the SDDs from Q2 2021. That would allow us to compare each cohort at the same time in their life cycles.

Turning back the clock on the HDDs, we find that using the HDD data from Q4 2016, we were able to create the following comparison.

Suddenly, the annualized failure rate (AFR) difference between SSDs and HDDs is not so massive. In fact, each drive type is within the other’s 95% confidence interval window. That window is fairly wide (plus or minus 0.5%) because of the relatively small number of drive days.
Where does that leave us? We have some evidence that when both types of drives are young (14 months on average in this case), the SSDs fail less often, but not by much. But you don’t buy a drive to last 14 months, you want it to last years. What do we know about that?

Failure Rates Over Time

We have data for HDD boot drives that go back to 2013 and for SSD boot drives going back to 2018. The chart below is the lifetime AFR for each drive type through Q2 2021.

As the graph shows, beginning in 2018, the HDD boot drive failure rate accelerated. This continued in 2019 and 2020 before leveling off in 2021 (so far). To state the obvious, as the age of the HDD boot drive fleet increased, so did the failure rate.

One point of interest is the similarity in the two curves through their first four data points. For the HDD cohort, year five (2018) was where the failure rate acceleration began. Is the same fate awaiting our SSDs as they age? While we can expect some increase in the AFR as the SSD age, will it be as dramatic as the HDD line?

Decision Time: SSD or HDD

Where does that leave us in choosing between buying a SSD or a HDD? Given what we know to date, using the failure rate as a factor in your decision is questionable. Once we controlled for age and drive days, the two drive types were similar and the difference was certainly not enough by itself to justify the extra cost of purchasing a SSD versus a HDD. At this point, you are better off deciding based on other factors: cost, speed required, electricity, form factor requirements, and so on.

Over the next couple of years, as we get a better idea of SSD failure rates, we will be able to decide whether or not to add the AFR to the SSD versus HDD buying guide checklist. Until then, we look forward to continued debate.

The post Are SSDs Really More Reliable Than Hard Drives? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcing Backblaze Developer Day: Build Blazing Apps

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/announcing-backblaze-developer-day-build-blazing-apps/

Join us for our inaugural Backblaze Developer Day on October 21st. This event is jam-packed with announcements, tech talks, lessons, SWAG, and more to help you understand how Backblaze B2 Cloud Storage can work for you. And it’s free, the good news just keeps coming.

Here’s What’s on the Horizon:

  • What’s New: Learn about brand new and recent partner alliances and integrations to serve more of your development needs.
  • Tour With Some Legends: Join Co-founder and CTO, Brian Wilson, and our Director of Evangelism, Andy Klein (of Drive Stats fame), for a decidedly unscripted, sure-to-be unexpected tour through the B2 Cloud Storage architecture, including APIs, SDKs, and CLI.
  • How to Put It Together: Get a rapid demo on one of our popular B2 Cloud Storage + compute + CDN combinations to meet functionality that will free your budget and your tech to do more.
  • A Panel on Tomorrow’s Development: The sunset of monolithic, closed ecosystems is here, so join us to discuss the future of microservices and interoperability.
  • What Comes Next: Finally, hear what’s next on the B2 Cloud Storage roadmap—and tell our head of product what you think should come next.

And so much more: We’ll be posting updates on partners and friends that will be joining us, as well as information about getting SWAG from the inaugural Backblaze Developer Day. Keep an eye on this space… So register today for free to grab your spot and we’ll see you on October 21st.

The post Announcing Backblaze Developer Day: Build Blazing Apps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Disaster Recovery With a Single Command

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/disaster-recovery-with-a-single-command/

Just before 5 a.m. on May 7, 2021, a ransom note from the DarkSide ransomware syndicate flashed across a Colonial Pipeline Co. employee’s computer screen. More than a month later, Joseph Blount, Colonial’s CEO, testified before Congress that recovery from the attack was still not complete.

Like most companies, Colonial maintained backups they could use to recover. They eventually did rely on those backups, but not before paying the ransom and attempting to decrypt files using a tool provided by the cybercriminals that was too painfully slow to wait for. The lesson here: Backup is vital as part of a disaster recovery plan, but the actual “recovery”—how you get your business back online using that backup data—is just as important. Few businesses can survive the hit of weeks or months spent offline.

Maybe your backup system is well-established, but the lift of recovery planning is overshadowed by more immediate demands on your team when you’re already stretched too thin. Or maybe you’ve looked into disaster recovery solutions, but they’re not right-sized for your business. Either way, if you’re using Veeam to manage backups, you can now consider your disaster recovery playbook complete.

Announcing Backblaze Instant Recovery in Any Cloud

We’re excited to announce Backblaze Instant Recovery in Any Cloud—an infrastructure as code (IaC) package that makes ransomware recovery into a VMware/Hyper-V based cloud easy to plan for and execute for any IT team.

“Most businesses know that backing up is critical for disaster recovery. But we see time and again that organizations under duress struggle with getting their systems back online, and that’s why Backblaze’s new solution can be a game-changer.”
—Mark Potter, CISO, Backblaze

When you discover a ransomware attack on your network, scrambling to get servers up and running is the last thing you want to worry about. Instant Recovery in Any Cloud provides businesses an easy and flexible path to as-soon-as-possible disaster recovery in the event of a ransomware attack by spinning up servers in a VMware/Hyper-V based cloud.

Teams can use an industry-standard automation tool to run a single command to quickly bring up an orchestrated combination of on-demand servers, firewalls, networking, storage, and other infrastructure in phoenixNAP, drawing data for your VMware/Hyper-V based environment over immediately from Veeam® Backup & Replication™ backups, so businesses can get back online with minimal disruption or expense. Put simply, it’s an on-demand path to a rock solid disaster recovery plan.

Below, we’ll explain the why and how of this solution, but if you’d like to learn more or ask questions, we’re offering a webinar on October 20 at 10 a.m. PST and have a full Knowledge Base article available here.

From 3-2-1 to Immutable Backups to Disaster Recovery

For many years, the 3-2-1 backup strategy was the gold standard for data protection. However, bad actors have become much more sophisticated. It’s a bad day for cybercriminals when victims can just restore and move on with their lives, so they started targeting backups alongside production data.

The introduction of Object Lock functionality allowed businesses to protect their backups from ransomware by making them immutable, meaning they are safe from encryption or deletion in case of attack. With immutable backups, you can access a working, uncorrupted copy of your data. But that is only the first step. The critical second step is making that data useful. The time to get back to business after an attack often depends on how quickly backup data can be brought online—more than any other factor. Few businesses can afford to go weeks or months offline, but even with immutable backups, they may be forced to if they don’t have a plan for putting those backups to work.

“For more than 400,000 Veeam customers, flexibility around disaster recovery options is essential,” said Andreas Neufert, Vice President of Product Management, Alliances at Veeam. “They need to know not only that their backups are safe, but that they’re practically usable in their time of need. We’re very happy to see Backblaze offering instant restore for all backups to VMware and Hyper-V based cloud offerings to help our joint customers thrive during challenging times.”

Disaster Recovery That Fits Your Needs

The most robust disaster recovery plans are built for enterprise customers and enterprise budgets. They typically involve paying for compute functionality on an ongoing basis as an “insurance policy” to have the ability to quickly spin up a server in case of an attack. Instant Recovery in Any Cloud opens disaster recovery to a huge number of businesses that were left without affordable solutions.

For savvy IT teams, this is essentially a cut and paste setup—an incredibly small amount of work to architect a recovery plan. The solution is written to work with phoenixNAP, and can be customized for other compute providers without difficulty.

Just as virtualized environments allow you to pay for processing power as needed rather than provisioning dedicated servers for every part of your business, Backblaze Instant Recovery in Any Cloud allows you to provision compute power on demand in a VMware and Hyper-V based cloud. The capacity is always there from Backblaze and phoenixNAP, but you don’t pay for it until you need it.

The code also allows you to implement a multi-cloud, vendor-agnostic disaster recovery approach rather than relying on just one platform—you can spin up a server in any compute environment you prefer. And because the recovery is entirely cloud based, you can execute this recovery plan from anywhere you’re able to access your accounts. Even if your whole network is down, you can still get your recovery plan rolling.

IT hero: 1
Ransomware: 0

How It Works and What You Need

Instant Recovery in Any Cloud works through a pre-built code package IT staff can use to create a digital mirror image of the infrastructure they have deployed on-premises in a VMware or Hyper-V based cloud. The code package is built in Ansible, an open-source tool which enables IaC. Running an Ansible playbook allows you to provision and configure infrastructure and deploy applications as needed. All components are pre-configured within the script. In order to get started, you can find the appropriate instructions on our GitHub page.

If you haven’t already, you also need to set up Backblaze B2 Cloud Storage as part of a Scale-Out Backup Repository with Immutability in Veeam using the Backblaze S3 Compatible API, and your data needs to be backed up securely before deploying the command.

Check out our step-by-step instructions for more detail and save the code below for future use.

Prepare for an Attack Before Disaster Strikes

With ransomware on the rise, disaster recovery is more important than ever. With tools like Object Lock and Backblaze Instant Recovery in Any Cloud, it doesn’t have to be complicated and costly. Make sure your backups are immutable with Object Lock, and keep the Ansible playbook and instructions on hand as part of a bigger ransomware recovery plan so that you’re ready in the event of an attack. Simply spin up servers and restore backups in a safe environment to minimize disruption to your business.

If you already have Veeam, you can create a Backblaze B2 account to get started. It’s free, easy, and quick, and you can stay one step ahead of cybercriminals.

The post Disaster Recovery With a Single Command appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: Hybrid Cloud vs. Multi-cloud

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/whats-the-diff-hybrid-cloud-vs-multi-cloud/

For as often as the terms multi-cloud and hybrid cloud get misused, it’s no wonder the concepts put a lot of very smart heads in a spin. The differences between a hybrid cloud and a multi-cloud strategy are simple, but choosing between the two models can have big implications for your business.

In this post, we’ll explain the difference between hybrid cloud and multi-cloud, describe some common use cases, and walk through some ways to get the most out of your cloud deployment.

What’s the Diff: Hybrid Cloud vs. Multi-cloud

Both hybrid cloud and multi-cloud strategies spread data over, you guessed it, multiple clouds. The difference lies in the type of cloud environments—public or private—used to do so. To understand the difference between hybrid cloud and multi-cloud, you first need to understand the differences between the two types of cloud environments.

A public cloud is operated by a third party vendor that sells data center resources to multiple customers over the internet. Much like renting an apartment in a high rise, tenants rent computing space and benefit from not having to worry about upkeep and maintenance of computing infrastructure. In a public cloud, your data may be on the same server as another customer, but it’s virtually separated from other customers’ data by the public cloud’s software layer. Companies like Amazon, Microsoft, Google, and us here at Backblaze are considered public cloud providers.

A private cloud, on the other hand, is akin to buying a house. In a private cloud environment, a business or organization typically owns and maintains all the infrastructure, hardware, and software to run a cloud on a private network.

Private clouds are usually built on-premises, but can be maintained off-site at a shared data center. You may be thinking, “Wait a second, that sounds a lot like a public cloud.” You’re not wrong. The key difference is that, even if your private cloud infrastructure is physically located off-site in a data center, the infrastructure is dedicated solely to you and typically protected behind your company’s firewall.

What Is Hybrid Cloud Storage?

A hybrid cloud strategy uses a private cloud and public cloud in combination. Most organizations that want to move to the cloud get started with a hybrid cloud deployment. They can move some data to the cloud without abandoning on-premises infrastructure right away.

A hybrid cloud deployment also works well for companies in industries where data security is governed by industry regulations. For example, the banking and financial industry has specific requirements for network controls, audits, retention, and oversight. A bank may keep sensitive, regulated data on a private cloud and low-risk data on a public cloud environment in a hybrid cloud strategy. Like financial services, health care providers also handle significant amounts of sensitive data and are subject to regulations like the Health Insurance Portability and Accountability Act (HIPAA), which requires various security safeguards where a hybrid cloud is ideal.

A hybrid cloud model also suits companies or departments with data-heavy workloads like media and entertainment. They can take advantage of high-speed, on-premises infrastructure to get fast access to large media files and store data that doesn’t need to be accessed as frequently—archives and backups, for example—with a scalable, low-cost public cloud provider.

Hybrid Cloud

What Is Multi-cloud Storage?

A multi-cloud strategy uses two or more public clouds in combination. A multi-cloud strategy works well for companies that want to avoid vendor lock-in or achieve data redundancy in a failover scenario. If one cloud provider experiences an outage, they can fall back on a second cloud provider.

Companies with operations in countries that have data residency laws also use multi-cloud strategies to meet regulatory requirements. They can run applications and store data in clouds that are located in specific geographic regions.


For more information on multi-cloud strategies, check out our Multi-cloud Architecture Guide.

Ways to Make Your Cloud Storage More Efficient

Whether you use hybrid cloud storage or multi-cloud storage, it’s vital to manage your cloud deployment efficiently and manage costs. To get the most out of your cloud strategy, we recommend the following:

  • Know your cost drivers. Cost management is one of the biggest challenges to a successful cloud strategy. Start by understanding the critical elements of your cloud bill. Track cloud usage from the beginning to validate costs against cloud invoices. And look for exceptions to historical trends (e.g., identify departments with a sudden spike in cloud storage usage and find out why they are creating and storing more data).
  • Identify low-latency requirements. Cloud data storage requires transmitting data between your location and the cloud provider. While cloud storage has come a long way in terms of speed, the physical distance can still lead to latency. The average professional who relies on email, spreadsheets, and presentations may never notice high latency. However, a few groups in your company may require low latency data storage (e.g., HD video editing). For those groups, it may be helpful to use a hybrid cloud approach.
  • Optimize your storage. If you use cloud storage for backup and records retention, your data consumption may rise significantly over time. Create a plan to regularly clean your data to make sure data is being correctly deleted when it is no longer needed.
  • Prioritize security. Investing up-front time and effort in a cloud security configuration pays off. At a minimum, review cloud provider-specific training resources. In addition, make sure you apply traditional access management principles (e.g., deleting inactive user accounts after a defined period) to manage your risks.

How to Choose a Cloud Strategy

To decide between hybrid cloud storage and multi-cloud storage, consider the following questions:

  • Low latency needs. Does your business need low latency capabilities? If so, a hybrid cloud solution may be best.
  • Geographical considerations. Does your company have offices in multiple locations and countries with data residency regulations? In that case, a multi-cloud storage strategy with data centers in several countries may be helpful.
  • Regulatory concerns. If there are industry-specific requirements for data retention and storage, these requirements may not be fulfilled equally by all cloud providers. Ask the provider how exactly they help you meet these requirements.
  • Cost management. Pay close attention to pricing tiers at the outset, and ask the provider what tools, reports, and other resources they provide to keep costs well managed.

Still wondering what type of cloud strategy is right for you? Ask away in the comments.

The post What’s the Diff: Hybrid Cloud vs. Multi-cloud appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: SSD vs. NVMe vs. M.2 Drives

Post Syndicated from Lora Maslenitsyna original https://www.backblaze.com/blog/nvme-vs-m-2-drives/

What's the Diff? M.2 vs. NVMe vs. SSD

The drive you use in your computer matters, and nowadays there’s plenty of options to choose from. Especially for people looking to outfit their gaming PCs or laptops with increased performance at a lower cost, considering the best kind of drive for a computer means taking into account the advantages and disadvantages of solid-state drives (SSDs). But with a number of different types to choose from, the hunt for the SSD you need can be confusing at the outset. Given our love for digging into the differences between and performances of many different types of drives, we developed this “What’s the Diff” post to lay it all out for you.

SSDs have become a popular option because of their smaller size while having the capacity for increased data read and write speeds. In this post, we’ll cover:

  • What is a SSD?
  • What is a SATA SSD?
  • What is a M.2 SSD?
  • What is a NVMe SSD?
  • Which SSD is right for you?

A Brief Introduction to SSDs

SSDs are common drives that are now standard issue for most computers, as is the case across Apple’s line of Macs. Rather than using disks, motors, and read/write heads like hard disk drives (HDDs), SSDs use persistent flash memory to retain information. SSDs have become so common mainly because they hold an advantage over HDDs for performing at higher speeds and using less power. You can read more about the difference between SSDs and HDDs in this post.

SSDs—namely NVMe drives and M.2 drives—are also available at a range of form factors so they can be applied to a range of devices.

Taking a Look at SATA SSDs

A Serial AT Attachment (SATA) is now widely considered the storage standard for PCs. A SATA SSD is an SSD equipped with a SATA interface. SATA SSDs have the advantage of being faster than spinning disc HDDs, but their speed caps at 600 MB/s. Generally, SATA SSDs offer lower cost storage than M.2 or NVMe drives, so they tend to be a better option for anyone seeking a general purpose drive on a tighter budget.

One of the disadvantages of SATA drives is that they require two cables to function correctly, so they can clutter your setup and even affect airflow within a computer. However, not all SSD form factors use the same type of connection, so they differ in speed and the clutter around your setup.

Western Digital WD Blue 1TV 3D NAND SATA SSD Solid State Drive

What Are M.2 Drives?

M.2 is a new form factor for SSDs that plug directly into a computer’s motherboard without the need for any extra cables. M.2 SSDs are significantly smaller than traditional, 2.5 inch SSDs, so they have become popular in gaming setups because they take up less space.

Even at this smaller size, M.2 SSDs are able to hold as much data as other SSDs, ranging up to 8TB in storage size. But, while they can hold just as much data and are generally faster than other SSDs, they also come at a higher cost. As the old adage goes, you can only have two of the following things: cheap, fast, or good.

People who are looking to improve their gaming setup with an M.2 SSD will need to make sure their motherboard has a M.2 slot or two. If your computer has two or more slots, you can run the drives in RAID.

An M.2 SSD can be SATA-based, PCIe-based with NVMe support, or PCIe-based without NVMe support. This versatility means that an M.2 SSD with NVMe support offers up to five times more bandwidth than a SATA M.2 model, providing faster performance for file transfers, video or photo editing, transcoding, compression, and decompression.

Samsung V-NAND SSD 860 EVO SATA M.2


Non-Volatile Memory Express (NVMe) drives were introduced in 2013 to attach to the PCI Express (PCIe) slot on a motherboard instead of using SATA bandwidth. NVMe drives can usually deliver a sustained read-write speed of 3.5 GB/s in contrast with SATA SSDs that limit at 600 MB/s. Since NVMe SSDs can reach higher speeds than SATA SSDs such as M.2 drives, it makes them ideal for gaming or high-resolution video editing.

Their high speeds come at a high cost, however: NVMe drives are some of the more expensive drives on the market. They are also only available for desktop PCs.


Which SSD Is Best to Use?

There are a few factors to consider in choosing which drive is best for you. As you compare the different components of your build, consider your technical constraints, budget, capacity needs, and speed priority.

Technical Constraints

Check the capability of your system before choosing a drive, as some older devices don’t have the components needed for NVMe connections. Also, check that you have enough PCIe connections to support multiple PCIe devices. Not enough lanes, or only specific lanes, means you may have to choose a different drive or that only one of your lanes will be able to connect to the NVMe drive at full speed.


SSDs and SATA drives tend to be more affordable options compared with NVMe drives. However, you should consider the performance upgrade that an NVMe drive can offer—if you plan to be making a lot of large file transfers or want to have the highest speeds for gaming, then a higher priced NVMe SSD is worth the investment. For example, at the time of publication, a Western Digital 1TB SATA SSD retails for around $100, while a Western Digital 1TB NVMe drive starts at around $200.

Drive Capacity

SATA drives usually range from 500GB to 16TB in storage capacity. Most M.2 drives top out at 2TB, although some may be available at 4TB and 8TB models at much higher prices.

Drive Speed

When choosing the right drive for your setup, remember that SATA M.2 drives and 2.5 inch SSDs provide the same level of speed, so to gain a performance increase, you will have to opt for the NVMe-connected drives. While NVMe SSDs are going to be much faster than SATA drives, you may also need to upgrade your processor to keep up or you may experience worse performance. Finally, remember to check read and write speeds on a drive as some earlier generations of NVMe drives can have different speeds.

Choose the Right Drive for Your Setup

Before choosing a new drive, remember to back up all of your data. Backing up is essential as every drive will eventually fail and need to be replaced. The basis of a solid backup plan requires three copies of your data: one on your device, one backup saved locally, and one stored off-site. Storing a copy of your data in the cloud ensures that you’re able to retrieve it if any data loss occurs on your device.

Interested in learning more about other drive types or best ways to optimize your gaming setup? Let us know in the comments below.

The post What’s the Diff: SSD vs. NVMe vs. M.2 Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Multi-cloud Architecture Guide

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/multi-cloud-strategy-architecture-guide/

diagram of a multi-cloud workflow

Cloud technology transformed the way IT departments operate over the past decade and a half. A 2020 survey by IDG found that 81% of organizations have at least one application or a portion of their computing infrastructure in the cloud (up from 51% in 2011) and 55% of organizations currently use more than one cloud provider in a multi-cloud strategy.

Deploying a multi-cloud approach doesn’t have to be complicated—“multi-cloud” simply means using two or more different cloud providers and leveraging their advantages to suit your needs. This approach provides an alternative to relying on one cloud provider or on-premises infrastructure to handle everything.

If you’re among the 45% of organizations not yet using a multi-cloud approach, or if you want to get more out of your multi-cloud strategy, this post explains what multi-cloud is, how it works, the benefits it offers, and considerations to keep in mind when rolling out a multi-cloud strategy.

First, Some Multi-cloud History

The shift to multi-cloud infrastructure over the past decade and a half can be traced to two trends in the cloud computing landscape. First, AWS, Google, and Microsoft—otherwise known as the “Big Three”—are no longer the only options for IT departments looking to move to the cloud. Since AWS launched in 2006, specialized infrastructure as a service (IaaS) providers have emerged to challenge the Big Three, giving companies more options for cloud deployments.

Second, many companies spent the decade after AWS’s launch making the transition from on-premises to the cloud. Now, new companies launching today are built to be cloud native and existing companies are poised to optimize their cloud deployments. They’ve crossed the hurdle of moving on-premises infrastructure to the cloud and can focus on how to architect their cloud environments to maximize the advantages of multi-cloud.

What Is Multi-cloud?

Nearly every software as a service (SaaS) platform is hosted in the cloud. So, if your company uses a tool like OneDrive or Google Workspace along with any other cloud service or platform, you’re technically operating in a “multi-cloud” environment. But using more than one SaaS platform does not constitute a true multi-cloud strategy.

To narrow the definition, when we in the cloud services industry say “multi-cloud,” we mean the public cloud platforms you use to architect your company’s infrastructure, including storage, networking, and compute.

illustration fo a single cloud workflow

By this definition, multi-cloud means using two different public IaaS providers rather than keeping all of your data in one diversified cloud provider like AWS or Google or using only on-premises infrastructure.

diagram of a multi-cloud workflow

Multi-cloud vs. Hybrid Cloud: What’s the Diff?

Multi-cloud refers to using more than one public cloud platform. Hybrid cloud refers to the combination of a private cloud with a public cloud. A private cloud is typically hosted on on-premises infrastructure, but can be hosted by a third party. The key difference between a private and public cloud is that the infrastructure, hardware, and software for a private cloud are maintained on a private network used exclusively by your business or organization.

Adding to the complexity, a company that uses a private cloud combined with more than one public cloud is really killing it with their cloud game using a hybrid multi-cloud strategy. It can all get pretty confusing, so stay tuned for a follow-up post that focuses solely on this topic.

How to Implement Multi-cloud: Use Cases

Companies operate multi-cloud environments for a variety of reasons. For some companies, the adoption of multi-cloud may have initially been an unintentional result of shadow IT—when separate departments adopt cloud services without engaging IT teams for assistance. As these deployments became integral to operations, IT teams likely incorporated them into an overall enterprise cloud strategy. For others, multi-cloud strategies are deployed intentionally given their suitability for specific business requirements.

So, how do you actually use a multi-cloud strategy, and what is a multi-cloud strategy good for? Multi-cloud has a number of compelling use cases and rationales, including:

  • Disaster recovery.
  • Failover.
  • Cost optimization.
  • Avoiding vendor lock-in.
  • Data sovereignty.
  • Access to specialized services.

Disaster Recovery

One of the biggest advantages of operating a multi-cloud environment is to achieve redundancy and plan for disaster recovery in a cloud-native deployment. Using multiple clouds helps IT departments implement a modern 3-2-1 backup strategy with three copies of their data, stored on two different types of media, with one stored off-site. When 3-2-1 evolved, it implied the other two copies were kept on-premises for fast recoveries.

As cloud services improved, the need for an on-premises backup shifted. Data can now be recovered nearly as quickly from a cloud as from on-premises infrastructure, and many companies no longer use physical infrastructure at all. For companies that want to be or already are cloud-native, keeping data in multiple public clouds reduces the risk one runs when keeping both production and backup copies with one provider. In the event of a disaster or ransomware attack, the multi-cloud user can restore data stored in their other, separate cloud environment, ideally one that offers tools like Object Lock to protect data with immutability.


Similarly, some cloud-native companies utilize multiple cloud providers to host mirrored copies of their active production data. If one of their public clouds suffers an outage, they have mechanisms in place to direct their applications to failover to a second public cloud.

E-commerce company, Big Cartel, pursued this strategy after AWS suffered a number of outages in past years that gave Big Cartel cause for concern. They host more than one million websites on behalf of their clients, and an outage would take them all down. “Having a single storage provider was a single point of failure that we grew less and less comfortable with over time,” Big Cartel Technical Director, Lee Jensen, acknowledged. Now, their data is stored in two public clouds—Amazon S3 and Backblaze B2 Cloud Storage. Their content delivery network (CDN), Fastly, preferentially pulls data from Backblaze B2 with Amazon S3 as failover.

Big Cartel - Matter Matters screenshot
Matter Matters: A Big Cartel customer site.

Cost Optimization

Challenger companies can offer incentives that compete with the Big Three and pricing structures that suit specialized data use cases. For example, some cloud providers offer free egress but put limits on how much data can be downloaded, while others charge nominal egress fees, but don’t cap downloads. Savvy companies employ multiple clouds for different types of data depending on how much data they have and how often it needs to be accessed.

SIMMER.io, a community site that makes sharing Unity WebGL games easy for indie game developers, would get hit with egress spikes from Amazon S3 whenever one of their hosted games went viral. The fees turned their success into a growth inhibitor. SIMMER.io mirrored their data to Backblaze B2 Cloud Storage and reduced egress to $0 as a result of the Bandwidth Alliance partnership between Backblaze and Cloudflare. They can grow their site without having to worry about increasing egress costs over time or usage spikes when games go viral, and they doubled redundancy in the process.

Dragon Spirit - The Goblins' Treasure screenshot
Dragon Spirit: A SIMMER.io hosted game.

Avoiding Vendor Lock-in

Many companies initially adopted one of the Big Three because they were the only game in town, but later felt restricted by their closed systems. Companies like Amazon and Google don’t play nice with each other and both seek to lock customers in with proprietary services. Adopting a multi-cloud infrastructure with interoperable providers gives these companies more negotiating power and control over their cloud deployments.

For example, Gideo, a connected TV app platform, initially used an all-in-one cloud provider for compute, storage, and content delivery, but felt they had no leverage to reduce their bills or improve the service they were receiving. They adopted a multi-cloud approach, building a tech stack with a mix of unconflicted partners where they no longer feel beholden to one provider.

Data Sovereignty

Many countries, as well as the European Union, have passed laws that regulate where and how data can be stored. Companies subject to these data residency standards may employ a multi-cloud approach to ensure their data meets regulatory requirements. They use multiple public cloud providers with different geographic footprints in locations where data must be stored.

Access to Specialized Services

Organizations may use different cloud providers to access specialized or complimentary services. For example, a company may use a public cloud like Vultr for access to compute resources or bare metal servers, but store their data with a different, interoperable public cloud that specializes in storage. Or a company may use a cloud storage provider in combination with a cloud CDN to distribute content faster to end users.

The Advantages of Multi-cloud Infrastructure

No matter the use case or rationale, companies achieve a number of advantages from deploying a multi-cloud infrastructure, including:

  1. Better Reliability and Lower Latency: In a failover scenario, if one cloud goes down, companies with a multi-cloud strategy have others to fall back on. If a company uses multiple clouds for data sovereignty or in combination with a CDN, they see reduced latency as their clouds are located closer to end users.
  2. Redundancy: With data in multiple, isolated clouds, companies are better protected from threats. If cybercriminals are able to access one set of data, companies are more likely to recover if they can restore from a second cloud environment that operates on a separate network.
  3. More Freedom and Flexibility: With a multi-cloud system, if something’s not working or if costs start to become unmanageable, companies have more leverage to influence changes and the ability to leave if another vendor offers better features or more affordable pricing. Businesses can also take advantage of industry partnerships to build flexible, cloud-agnostic tech stacks using best-of-breed providers.
  4. Affordability: It may seem counterintuitive that using more clouds would cost less, but it’s true. Diversified cloud providers like AWS make their services hard to quit for a reason—when you can’t leave, they can charge you whatever they want. A multi-cloud system allows you to take advantage of competitive pricing among platforms.
  5. Best-of-breed Services: Adopting a multi-cloud strategy means you can work with providers who specialize in doing one thing really well rather than doing all things middlingly. Cloud platforms specialize to offer customers top-of-the-line service, features, and support rather than providing a one-size-fits all solution.

The Challenges of Multi-cloud Infrastructure

The advantages of a multi-cloud system have attracted an increasing number of companies, but it’s not without challenges. Controlling costs, data security, and governance were named in the top five challenges in the IDG study. That’s why it’s all the more important to consider your cloud infrastructure early on, follow best practices, and plan ways to manage eventualities.

a developer looking at code on multiple monitors
Overcome multi-cloud challenges with multi-cloud best practices.

Multi-cloud Best Practices

As you plan your multi-cloud strategy, keep the following considerations in mind:

  • Deployment strategies.
  • Cost management.
  • Data security.
  • Governance.

Multi-cloud Deployment Strategies

There are likely as many ways to deploy a multi-cloud strategy as there are companies using a multi-cloud strategy. But, they generally fall into two broader categories—redundant or distributed.

In a redundant deployment, data is mirrored in more than one cloud environment, for example, for failover or disaster recovery. Companies that use a multi-cloud approach rather than a hybrid approach to store backup data are using a redundant multi-cloud deployment strategy. Most IT teams looking to use a multi-cloud approach to back up company data or environments will fall into this category.

A distributed deployment model more often applies to software development teams. In a distributed deployment, different workloads or different components of the same application are spread across multiple cloud computing environments based on the best fit. For example, a DevOps team might host their compute infrastructure in one public cloud and storage in another.

Your business requirements will dictate which type of deployment you should use. Knowing your deployment approach from the outset can help you pick providers with the right mix of services and billing structures for your multi-cloud strategy.

Multi-cloud Cost Management

Cost management of cloud environments is a challenge every company will face even if you choose to stay with one provider—so much so that companies make cloud optimization their whole business model. Set up a process to track your cloud utilization and spend, and seek out cloud providers that offer straightforward, transparent pricing to make budgeting simpler.

Multi-cloud Data Security

Security risks increase as your cloud environment becomes more complex. There are more attack surfaces, and you’ll want to plan security measures accordingly. To take advantage of multi-cloud benefits while reducing risk, follow multi-cloud security best practices:

  • Ensure you have controls in place for authentication across platforms. Your different cloud providers likely have different authentication protocols, and you need a framework and security protocols that work across providers.
  • Train your team appropriately to identify cybersecurity risks.
  • Stay up to date on security patches. Each cloud provider will publish their own upgrades and patches. Make sure to automate upgrades as much as possible.
  • Consider using a tool like Object Lock to protect data with immutability. Object Lock allows you to store objects using a Write Once, Read Many (WORM) model, meaning after it’s written, data cannot be modified or deleted for a defined period of time. Any attempts to manipulate, copy, encrypt, change, or delete the file will fail during that time. The files may be accessed, but no one can change them, including the file owner or whoever set the Object Lock.

Multi-cloud Governance

As cloud adoption grows across your company, you’ll need to have clear protocols for how your infrastructure is managed. Consider creating standard operating procedures for cloud platform management and provisioning to avoid shadow IT proliferation. And set up policies for centralized security monitoring.

Ready for Multi-cloud? Migration Strategies

If you’re ready to go multi-cloud, you’re probably wondering how to get your data from your on-premises infrastructure to the cloud or from one cloud to another. After choosing a provider that fits your needs, you can start planning your data migration. There are a range of tools for moving your data, but when it comes to moving between cloud services, a tool like our Cloud to Cloud Migration can help make things a lot easier and faster.

Have any more questions about multi-cloud or cloud migration? Let us know in the comments.

The post Multi-cloud Architecture Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

The True Cost of Ransomware

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/the-true-cost-of-ransomware/

Today, cybercriminals demand ransoms on the order of hundreds of thousands or even millions of dollars. 2021 saw the highest ransom ever demanded hit $70 million in the REvil attack on Kaseya. But the ransoms themselves are just a portion, and often a small portion, of the overall cost of ransomware.

Big ransoms like the one above may make headlines, but a huge majority of attacks are carried out against small and medium-sized businesses (SMBs) and organizations—security consultant Coveware reported that they comprise 70% of all ransomware attacks. And the cost of recoveries can be staggering. In this post, we’re taking a look at the true cost of ransomware and the drivers of those costs.

This post is a part of our ongoing series on ransomware. Take a look at our other posts for more information on how businesses can defend themselves against a ransomware attack, and more.

Ransoms Are the First Item on the Bill

The Sophos State of Ransomware 2021 report, a survey of 5,400 IT decision makers in mid-sized organizations in 30 countries, found the average ransom payment was $170,404 in 2020. However, the spectrum of ransom payments was wide. The most common payment was $10,000 (paid by 20 respondents), with the highest payment a massive $3.2 million (paid by two respondents). In their own reporting, Coveware found that the average ransom payment was $136,576 in Q2 2021, but that number fluctuates quarter to quarter.

Source: Coveware.

Yet another source, Palo Alto Networks, recently reported that the average ransom payment hit $570,000—82% higher than 2020’s average of $312,000. Predictions from Cybersecurity Ventures paint an even bleaker picture, putting worldwide ransomware damages in the tens of billions of dollars by the end of 2021.

Though the numbers vary, the data show that ransoms are not just pocket change for SMBs any way you slice it.

But, Ransoms Are Far From the Only Cost

The true costs of ransomware recovery soar into the millions with the added complication of being much harder to quantify. According to Sophos, the average bill for recovering from a ransomware attack, including downtime, people hours, device costs, network costs, lost opportunities, ransom paid, etc. was $1.85 million in 2021. The cost of recovery comes from a wide range of factors, including:

  • Downtime.
  • People hours.
  • Stronger cybersecurity protections.
  • Repeat attacks.
  • Higher insurance premiums.
  • Legal defense and settlements.
  • Lost reputation.
  • Lost business.


The downtime resulting from ransomware can be incredibly disruptive, and not just for the companies themselves. The Colonial Pipeline attack shut down gasoline service to almost half of the East Coast for six days. An attack on a Vermont health center had hospitals turning away patients. And an attack on Baltimore County Public Schools forced more than 100,000 students to miss classes. According to Coveware, the average downtime in Q2 2021 amounted to over three weeks (23 days). This time should be factored in when calculating the true cost of ransomware.

People Hours

While Colonial restored service after six days, CEO Joseph Blount testified before Congress more than a month after the attack that recovery was still ongoing. For a small business, most, if not all, of the company’s efforts will be directed toward recovery for a period of time. Obviously, the IT team will be focused on getting systems back up and running, but other areas of the business will be monopolized as well. Marketing and communications teams will be tasked with crisis communications. The finance team will be brought into ransom negotiations. Human resources will be fielding employee questions and concerns. Calculating the total hours spent on recovery may not be possible, but it’s a factor to consider in planning.

Stronger Cybersecurity Protections

A company that’s been attacked by ransomware will likely allocate more budget to avoid the same fate in the future, and rightfully so. Moreover, the increase in attacks and subsequent tightening of requirements from insurance providers means that more companies will be forced to bring systems up to speed in order to maintain coverage.

Repeat Attacks

One of the cruel realities of being attacked by ransomware is that it makes businesses a target for repeat attacks. Unsurprisingly, hackers don’t always keep their promises when companies pay ransoms. In fact, paying ransoms lets cybercriminals know you’re an easy mark. This behavior used to be rare, but has become more common in 2021. We’ve seen reports of repeat attacks, either because companies already demonstrated willingness to pay or because the vulnerability that allowed hackers access to systems remained susceptible to exploitation. More ransomware operators have been exfiltrating additional data during the recovery period, and copycat operators have been exploiting vulnerabilities that go unaddressed even for a few days. Some companies ended up paying a second time.

Higher Insurance Premiums

As more and more companies file claims for ransomware attacks and recoveries, insurers are increasing premiums. The damages their customers are incurring are beginning to exceed estimates, forcing premiums to rise.

Legal Defense and Settlements

When attacks affect consumers or customers, victims can expect to hear from the lawyers. The Washington Post reported that Scripps Health, a San Diego hospital system, was hit with multiple class-action lawsuits after a ransomware attack in April. And big box stores like Target and Home Depot both paid settlements in the tens of millions of dollars following breaches. Even if your information security practices would hold up in court, the article explains that for most companies, it’s cheaper to settle than to suffer a protracted legal battle.

Lost Reputation and Lost Business

Thanks to the Colonial attack, ransomware is getting more coverage in the mainstream media. Hopefully this increased attention helps to discourage ransomware operators (they’re not in it for the fame, and it’s never a good day for cybercriminals when the president of the United States gets involved). But, that means companies are likely to be under more scrutiny if they happen to fall victim to an attack, jeopardizing their reputation and ability to develop business. And when companies lose their customers’ trust, they lose money.

lock over an image of a woman working on a computer

What You Can Do About It: Defending Against Ransomware

The business of ransomware is booming with no signs of slowing down, and the cost of recovery is enough to put some ill-prepared companies out of business. If it feels like the cost of a ransomware recovery is out of reach, that’s all the more reason to invest in harder security protocols and business continuity planning sooner rather than later.

For more information on the ransomware economy, the threat SMBs are facing, and steps you can take to protect your business, download The Complete Guide to Ransomware.

The post The True Cost of Ransomware appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Introducing the Ransomware Economy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/ransomware-economy/

Ransomware skull and code symbols

Ransomware continues to proliferate for a simple reason—it’s profitable. And it’s profitable not just for the ransomware developers themselves—they’re just one part of the equation—but for a whole ecosystem of players who make up the ransomware economy. To understand the threats to small and medium-sized businesses (SMBs) and organizations today, it’s important to understand the scope and scale of what you’re up against.

Today, we’re digging into how the ransomware economy operates, including the broader ecosystem and the players involved, emerging threats to SMBs, and the overall financial footprint of ransomware worldwide.

This post is a part of our ongoing series on ransomware. Take a look at our other posts for more information on how businesses can defend themselves against a ransomware attack, and more.

Top Ransomware Syndicates in Operation Today

Cybercriminals have long been described as operating in “gangs.” The label conjures images of hackers furiously tapping away at glowing workstations in a shadowy warehouse. But the work of the ransomware economy today is more likely to take place in a boardroom than a back alley. Cybercriminals have graduated from gangs to highly complex organized crime syndicates that operate ransomware brands as part of a sophisticated business model.

Operators of these syndicates are just as likely to be worrying about user experience and customer service as they are with building malicious code. A look at the branding on display on some syndicates’ leak sites makes the case plain that these groups are more than a collective of expert coders—they’re savvy businesspeople.

images of ransomware gang marketing
Source: Bleepingcomputer.com.

Ransomware operators are often synonymous with the software variant they brand, deploy, and sell. Many have rebranded over the years or splintered into affiliated organizations. Some of the top ransomware brands operating today, along with high profile attacks they have carried out, are shown in the infographic below:

infographic of top ransomware brands

The groups shown above do not constitute an exhaustive list. In June 2021, FBI Director Christopher Wray stated that the FBI was investigating 100 different ransomware variants and new ones pop up everyday. While some brands have existed for years (Ryuk, for example), the list is also likely obsolete as soon as it’s published. Ransomware brands bubble up, go bust, and reorganize, changing with the cybersecurity tides.

Chainalysis, a blockchain data platform, published their Ransomware 2021: Critical Mid-year Update that shows just how much brands fluctuate year to year and, they note, even month to month:

Top 10 ransomware strains by revenue by year, 2014-2021 Q1
Source: Chainalysis.

How Ransomware Syndicates Operate

Ransomware operators may appear to be single entities, but there is a complex ecosystem of suppliers and ancillary providers behind them that exchange services with each other on the dark web. The flowchart below illustrates all the players and how they interact:

diagram of ransomware syndicate workflow

Dark Web Service Providers

Cybercrime “gangs” could once be tracked down and caught like the David Levi Phishing Gang that was investigated and prosecuted in 2005. Today’s decentralized ecosystem, however, makes going after ransomware operators all the more difficult. These independent entities may never interact with each other outside of the dark web where they exchange services for cryptocurrency:

    • Botmasters: Create networks of infected computers and sell access to those compromised devices to threat actors.
    • Access Sellers: Take advantage of publicly disclosed vulnerabilities to infect servers before the vulnerabilities are remedied, then advertise and sell that access to threat actors.
ad for ransomware syndicate
Advertisement from an access seller for access to an organization’s RDP. Source: Threatpost.
  • Operators: The entity that actually carries out the attack with access purchased from botmasters or access sellers and software purchased from developers or developed in-house. May employ a full staff, including customer service, IT support, marketing, etc. depending on how sophisticated the syndicate is.
  • Developers: Write the ransomware software and sell it to threat actors for a cut of the ransom.
  • Packer Developers: Add protection layers to the software, making it harder to detect.
  • Analysts: Evaluate the victim’s financial health to advise on ransom amounts that they’re most likely to pay.
  • Affiliates: Purchase ransomware as a service from operators/developers who get a cut of the ransom.
  • Negotiating Agents: Handle interactions with victims.
  • Laundering Services: Exchange cryptocurrency for fiat currency on exchanges or otherwise transform ransom payments into usable assets.

Victim-side Service Providers

Beyond the collection of entities directly involved in the deployment of ransomware, the broader ecosystem includes other players on the victim’s side, who, for better or worse, stand to profit off of ransomware attacks. These include:

  • Incident response firms: Consultants who assist victims in response and recovery.
  • Ransomware brokers: Brought in to negotiate and handle payment on behalf of the victim and act as intermediaries between the victim and operators.
  • Insurance providers: Cover victims’ damages in the event of an attack.
  • Legal counsel: Often manage the relationship between the broker, insurance provider, and victim, and advise on ransom payment decision-making.

Are Victim-side Providers Complicit?

While these providers work on behalf of victims, they also perpetuate the cycle of ransomware. For example, insurance providers that cover businesses in the event of a ransomware attack often advise their customers to pay the ransom if they think it will minimize downtime as the cost of extended downtime can far exceed the cost of a ransom payment. This becomes problematic for a few reasons:

  • First, paying the ransom incentivizes cybercriminals to continue plying their trade.
  • Second, as Colonial Pipeline discovered, the decryption tools provided by cybercriminals in exchange for ransom payments aren’t to be trusted. More than a month after Colonial paid the $4.4 million ransom and received a decryption tool from the hackers, CEO Joseph Blount testified before Congress that recovery from the attack was still not complete. After all that, they had to rely on recovering from their backups anyway.

The Emergence of Ransomware as a Service

In the ransomware economy, operators and their affiliates are the threat actors that carry out attacks. This affiliate model where operators sell ransomware as a service (RaaS) represents one of the biggest threats to SMBs and organizations today.

Cybercrime syndicates realized they could essentially license and sell their tech to affiliates who then carry out their own misdeeds empowered by another criminal’s software. The syndicates, affiliates, and other entities each take a portion of the ransom.

Operators advertise these partner programs on the dark web and thoroughly vet affiliates before bringing them on to filter out law enforcement posing as low-level criminals. One advertisement by the REvil syndicate noted, “No doubt, in the FBI and other special services, there are people who speak Russian perfectly, but their level is certainly not the one native speakers have. Check these people by asking them questions about the history of Ukraine, Belarus, Kazakhstan or Russia, which cannot be googled. Authentic proverbs, expressions, etc.”

Ransomware as a service ad
Advertisement for ransomware affiliates. Source: Kaspersky.

Though less sophisticated than some of the more notorious viruses, these “as a service” variants enable even amateur cybercriminals to carry out attacks. And they’re likely to carry those attacks out on the easiest prey—small businesses who don’t have the resources to implement adequate protections or weather extended downtime.

Hoping to increase their chances of being paid, low-level threat actors using RaaS typically demanded smaller ransoms, under $100,000, but that trend is changing. Coveware reported in August 2020 that affiliates are getting bolder in their demands. They reported the first six-figure payments to the Dharma ransomware group, an affiliate syndicate, in Q2 2020.

The one advantage savvy business owners have when it comes to RaaS: attacks are high volume (carried out against many thousands of targets) but low quality and easily identifiable by the time they are widely distributed. By staying on top of antivirus protections and detection, business owners can increase their chances of catching the attacks before it’s too late.

The Financial Side of the Ransomware Economy

So, how much money do ransomware crime syndicates actually make? The short answer is that it’s difficult to know because so many ransomware attacks go unreported. To get some idea of the size of the ransomware economy, analysts have to do some sleuthing.

Chainalysis tracks transactions to blockchain addresses linked to ransomware attacks in order to capture the size of ransomware revenues. In their regular reporting on the cybercrime cryptocurrency landscape, they showed that the total amount paid by ransomware victims increased by 311% in 2020 to reach nearly $350 million worth of cryptocurrency. In May, they published an update after identifying new ransomware addresses that put the number over $406 million. They expect the number will only continue to grow.

Total cryptocurrency value received by ransomware addresses, 2016-2021 (YTD)
Source: Chainalysis.

Similarly, threat intel company, Advanced Intelligence, and cybersecurity firm, HYAS, tracked Bitcoin transactions to 61 addresses associated with the Ryuk syndicate. They estimate that the operator may be worth upwards of $150 million alone. Their analysis sheds some light on how ransomware operators turn their exploits and the ransoms paid into usable cash.

Extorted funds are gathered in holding accounts, passed to money laundering services, then either funneled back into the criminal market and used to pay for other criminal services or cashed out at real cryptocurrency exchanges. The process follows these steps, as illustrated below:

  • The victim pays a broker.
  • The broker converts the cash into cryptocurrency.
  • The broker pays the ransomware operator in cryptocurrency.
  • The ransomware operator sends the cryptocurrency to a laundering service.
  • The laundering service exchanges the coins for fiat currency on cryptocurrency exchanges like Binance and Huobi.
diagram of ransomware payment flow
Source: AdvIntel.

In an interesting development, the report found that Ryuk actually bypassed laundering services and cashed out some of their own cryptocurrency directly on exchanges using stolen identities—a brash move for any organized crime operation.

Protecting Your Company From Ransomware

Even though the ransomware economy is ever-changing, having an awareness of where attacks come and the threats you’re facing can prepare you if you ever face one yourself. To summarize:

  • Ransomware operators may seem to be single entities, but there’s a broad ecosystem of players behind them that trade services on the dark web.
  • Ransomware operators are sophisticated business entities.
  • RaaS enables even low-level criminals to get in the game.
  • Ransomware operators raked in at least $406 million in 2020, and likely more than that, as many ransomware attacks and payments go unreported.

We put this post together not to trade in fear, but to prepare SMBs and organizations with information in the fight against ransomware. And, you don’t have to fight it alone. Download our Complete Guide to Ransomware E-book and Guide for even more intel on ransomware today, plus steps to take to defend against ransomware, and how to respond if you do fall victim to an attack.

The post Introducing the Ransomware Economy appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Developers Get EC2 Alternative With Vultr Cloud Compute and Bare Metal

Post Syndicated from Amrit Singh original https://www.backblaze.com/blog/developers-get-ec2-alternative-with-vultr-cloud-compute-and-bare-metal/

The old saying, “birds of a feather flock together,” couldn’t be more true of the latest addition to the Backblaze partner network. Today, we announce a new partnership with Vultr—the largest privately-owned, global hyperscale cloud—to serve developers and businesses with infrastructure as a service that’s easier to use and lower cost than perhaps better known alternatives.

With the Backblaze + Vultr combination, developers now have the ability to connect data stored in Backblaze B2 with virtualized cloud compute and bare metal resources in Vultr—providing a compelling alternative to Amazon S3 and EC2. Each Vultr compute instance includes a fixed amount of bandwidth, meaning that developers can easily transfer data between Vultr’s 17 global locations and Backblaze at no additional cost.

In addition to replacing AWS EC2, Vultr’s complete product line also offers load balancers and block storage which can seamlessly replace Amazon Elastic Load Balancing (ELB) and Elastic Block Storage (EBS).

With this partnership, developers of any size can avoid vendor lock-in, access best of breed services, and do more with the data they have stored in the cloud with ease, including:

  • Running analysis on stored data.
  • Deploying applications and storing application data.
  • Transcoding media and provisioning origin storage for streaming and video on-demand applications.

Backblaze + Vultr: Better Together

Vultr’s ease of use and comparatively low costs have motivated more than 1.3 million developers around the world to use its service. We recognized a shared culture in Vultr, which is why we’re looking forward to seeing what our joint customers can do with this partnership. Like Backblaze, Vultr was founded with minimal outside investment. Both services are transparent, affordable, simple to start without having to talk to sales (although sales support is only a call or email away), and, above all, easy. Vultr is on a mission to simplify deployment of cloud infrastructure, and Backblaze is on a mission to simplify cloud storage.

Rather than trying to be everything for everyone, both businesses play to their strengths, and customers get the benefit of working with unconflicted partners.

Vultr’s pricing often comes in at half the cost of the big three—Amazon, Google, and Microsoft—and with Vultr’s bundled egress, we’re working together to alleviate the burden of bandwidth costs, which can be disproportionately huge for small and medium-sized businesses.

“The Backblaze-Vultr partnership means more developers can build the flexible tech stacks they want to build, without having to make needlessly tough choices between access and affordability,” said Shane Zide, VP of Global Partnerships at Vultr. “When two companies who focus on ease of use and price performance work together, the whole is greater than the sum of the parts.”

Fly Higher With Backblaze B2 + Vultr

Existing Backblaze B2 customers now have unfettered access to compute resources with Vultr, and Vultr customers can connect to astonishingly easy cloud storage with Backblaze B2. If you’re not yet a B2 Cloud Storage customer, create an account to get started in minutes. If you’re already a B2 Cloud Storage customer, click here to activate an account with Vultr.

For developers looking to do more with their data, we welcome you to join the flock. Get started with Backblaze B2 and Vultr today.

The post Developers Get EC2 Alternative With Vultr Cloud Compute and Bare Metal appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Optimizing Website Performance With a CDN

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/optimizing-website-performance-with-a-cdn/

If you’ve ever wondered how a content delivery network (CDN) works, here’s a decent analogy… For most of the year, I keep one, maybe two boxes of tissues in the house. But, during allergy season, there’s a box in every room. When pollen counts are up, you need zero latency between sneeze and tissue deployment.

Instead of tissues in every room of the house, a CDN has servers in every corner of the globe, and they help reduce latency between a user’s request and when the website loads. If you want to make sure your website loads quickly no matter who accesses it, a CDN can help. Today, we’ll dig into the benefits of CDNs, how they work, and some common use cases with real-world examples.

What Is a CDN?

According to Cloudflare, one of the market leaders in CDN services, a CDN is “a geographically distributed group of servers which work together to provide fast delivery of internet content.” A CDN speeds up your website performance by temporarily keeping your website content on servers that are closer to end users. This is known as caching.

When someone in Australia visits your website that’s hosted in New York City, instead of fetching content like images, video, HTML pages, javascript files, etc. all the way from the the “origin store” (the server where the main, original website lives in the Big Apple), the CDN fetches content from an “edge server” that’s geographically closer to the end user at the edge of the network. Your website loads much faster when the content doesn’t have to travel halfway around the world to reach your website visitors.

How Do CDNs Work?

While a CDN does consist of servers that host website content, a CDN cannot serve as a web host itself一you still need traditional web hosting to operate your website. The CDN just holds your website content on servers closer to your end users. It refers back to the main, original website content that’s stored on your origin store in case you make any changes or updates.

Your origin store could be an actual, on-premises server located wherever your business is headquartered, but many growing businesses opt to use cloud storage providers to serve as their origin store. With cloud storage, they can scale up or down as website content grows and only pay for what they need rather than investing in expensive on-premises servers and networking equipment.

The CDN provider sets up their edge servers at internet exchange points, or IXPs. IXPs are points where traffic flows between different internet service providers like a highway interchange so your data can get to end users faster.

Source: Cloudflare.

Not all of your website content will be stored on IXPs all of the time. A user must first request that website content. After the CDN retrieves it from the origin store to whatever server is nearest to the end user, it keeps it on that server as long as the content continues to be requested. The content has a specific “time to live,” or TTL, on the server. The TTL specifies how long the edge server keeps the content. At a certain point, if the content has not been requested within the TTL, the server will stop storing the content.

When a user pulls up website content from the cache on the edge server, it’s known as a cache hit. When the content is not in the cache and must be fetched from the origin store, it’s known as a cache miss. The ratio of hits to misses is known as the cache hit ratio, and it’s an important metric for website owners who use cloud storage as their origin and are trying to optimize their egress fees (the fees cloud storage providers charge to send data out of their systems). The better the cache hit ratio, the less they’ll be charged for egress out of their origin store.

Another important metric for CDN users is round trip time, or RTT. RTT is the time it takes for a request from a user to travel to its destination and back again. RTT metrics help website owners understand the health of a network and the speed of network connections. A CDN’s primary purpose is to reduce RTT as much as possible.

Key Terms: Demystifying Acronym Soup

  • Origin Store: The main server or cloud storage provider where your website content lives.
  • CDN: Content delivery network, a geographically distributed group of servers that work to deliver internet content fast.
  • Edge Server: Servers in a CDN network that are located at the edge of the network.
  • IXP: Internet exchange point, a point where traffic flows between different internet service providers.
  • TTL: Time to live, the time content has to live on edge servers.
  • RTT: Round trip time, the time it takes for a request from a user to travel to its destination and back.
  • Cache Hit Ratio: The ratio of times content is retrieved from edge servers in the CDN network vs. the times content must be retrieved from the origin store.

Do I Need a CDN?

CDNs are a necessity for companies with a global presence or with particularly complex websites that deliver a lot of content, but you don’t have to be a huge enterprise to benefit from a CDN. You might be surprised to know that more than half of the world’s website content is delivered by CDN, according to Akamai, one of the first providers to offer CDN services.

What Are the Benefits of a CDN?

A CDN offers a few specific benefits for companies, including:

  • Faster website load times.
  • Lower bandwidth costs.
  • Redundancy and scalability during high traffic times.
  • Improved security.

Faster website load times: Content is distributed closer to visitors, which is incredibly important for improving bounce rates. Website visitors are orders of magnitude more likely to click away from a site the longer it takes to load. The probability of a bounce increases 90% as the page load time goes from one second to five on mobile devices, and website conversion rates drop by an average of 4.42% with each additional second of load time. If an e-commerce company makes $50 per conversion and does about $150,000 per month in business, a drop in conversion of 4.42% would equate to a loss of almost $80,000 per year.

If you still think seconds can’t make that much of a difference, think again. Amazon calculated that a page load slowdown of just one second could cost it $1.6 billion in sales each year. With website content distributed closer to website users via a CDN, pages load faster, reducing bounce rates.

Image credit: HubSpot. Data credit: Portent.

Lower bandwidth costs: Bandwidth costs are the costs companies and website owners pay to move their data around telecommunications networks. The farther your data has to go and the faster it needs to get there, the more you’re going to pay in bandwidth costs. The caching that a CDN provides reduces the need for content to travel as far, thus reducing bandwidth costs.

Redundancy and scalability during high traffic times: With multiple servers, a CDN can handle hardware failures better than relying on one origin server alone. If one goes down, another server can pick up the slack. Also, when traffic spikes, a single origin server may not be able to handle the load. Since CDNs are geographically distributed, they spread traffic out over more servers during high traffic times and can handle more traffic than just an origin server.

Improved security: In a DDoS, or distributed denial-of-service attack, malicious actors will try to flood a server or network with traffic to overwhelm it. Most CDNs offer security measures like DDoS mitigation, the ability to block content at the edge, and other enhanced security features.

CDN Cost and Complexity

CDN costs vary by the use case, but getting started can be relatively low or no-cost. Some CDN providers like Cloudflare offer a free tier if you’re just starting a business or for personal or hobby projects, and upgrading to Cloudflare’s Pro tier is just $20 a month for added security features and accelerated mobile load speeds. Other providers, like Fastly, offer a free trial.

Beyond the free tier or trial, pricing for most CDN providers is dynamic. For Amazon CloudFront, for example, you’ll pay different rates for different volumes of data in different regions. It can get complicated quickly, and some CDNs will want to work directly with you on a quote.

At an enterprise scale, understanding if CDN pricing is worth it is a matter of comparing the cost of the CDN to the cost of what you would have paid in egress fees. Some cloud providers and CDNs like those in the Bandwidth Alliance have also teamed up to pass egress savings on to shared users, which can substantially reduce costs related to content storage and delivery. Look into discounts like this when searching for a CDN.

Another way to evaluate if a CDN is right for your business is to look at the opportunity cost of not having one. Using the example above, an e-commerce company that makes $50 per conversion and does $150,000 of business per month stands to lose $80,000 per year due to latency issues. While CDN costs can reach into the thousands per month, the exercise of researching CDN providers and pricing out what your particular spend might be is definitely worth it when you stand to save that much in lost opportunities.

Setting up a CDN is relatively easy. You just need to create an account and connect it to your origin server. Each provider will have documentation to walk you through how to configure their service. Beyond the basic setup, CDNs offer additional features and services like health checks, streaming logs, and security features you can configure to get the most out of your CDN instance. Fastly, for example, allows you to create custom configurations using their Varnish Configuration Language, or VCL. If you’re just starting out, setting up a CDN can be very simple, but if you need or want more bells and whistles, the capabilities are definitely out there.

Who Benefits Most From a CDN?

While a CDN is beneficial for any company with broad geographic reach or a content-heavy site, some specific industries see more benefits from a CDN than others, including e-commerce, streaming media, and gaming.

E-commerce and CDN: Most e-commerce companies also host lots of images and videos to best display their products to customers, so they have lots of content that needs to be delivered. They also stand to lose the most business from slow loading websites, so implementing a CDN is a natural fit for them.

E-commerce Hosting Provider Delivers One Million Websites

Big Cartel is an e-commerce platform that makes it easy for artists, musicians, and independent business owners to build unique online stores. They’ve long used a CDN to make sure they can deliver more than one million websites around the globe at speed on behalf of their customers.

They switched from Amazon’s Cloudfront to Fastly in 2015. As an API-first, edge cloud platform designed for programmability, the team felt Fastly gave Big Cartel more functionality and control than CloudFront. With the Fastly VCL, Big Cartel can detect patterns of abusive behavior, block content at the edge, and optimize images for different browsers on the fly. “Fastly has really been a force multiplier for us. They came into the space with published, open, transparent pricing and the configurability of VCL won us over,” said Lee Jensen, Big Cartel’s Technical Director.

Streaming Media and CDN: Like e-commerce sites, streaming media sites host a lot of content, and need to deliver that content with speed and reliability. Anyone who’s lost service in the middle of a Netflix binge knows: buffering and dropped shows won’t cut it.

Movie Streaming Platform Triples Redundancy

Kanopy is a video streaming platform serving more than 4,000 libraries and 45 million patrons worldwide. In order for a film to be streamed without delays or buffering, it must first be transcoded, or broken up into smaller, compressed files known as “chunks.” A feature-length film may translate to thousands of five to 10-second chunks, and losing just one can cause playback issues that disrupt the customer viewing experience.

Kanopy used a provider that offered a CDN, origin storage, and transcoding all in one, but the provider lost chunks, disrupting the viewing experience. One thing their legacy CDN didn’t provide was backups. If the file couldn’t be located in their primary storage, it was gone.

They switched to a multi-cloud stack, engaging Cloudflare as a CDN and tripled their redundancy by using a cold archive, an origin store, and backup storage.

Gaming and CDN: Gaming platforms, too, have a heavy burden of graphics, images, and video to manage. They also need to deliver content fast and at speed or they risk games glitching up in the middle of a deciding moment.

Gaming Platform Wins When Games Go Viral

SIMMER.io is a community site that makes sharing Unity WebGL games easy for indie game developers. Whenever a game would go viral, their egress costs boiled over, hindering growth. SIMMER.io mirrored their data from Amazon S3 to Backblaze B2 and reduced egress to $0 as a result of the Bandwidth Alliance. They can now grow their site without having to worry about increasing egress costs over time or usage spikes when games go viral.

In addition to the types of companies listed above, financial institutions, media properties, mobile apps, and government entities can benefit from a CDN as well. However, a CDN is not going to be right for everyone. If your audience is hyper-targeted in a specific geographic location, you likely don’t need a CDN and can simply use a geolocated web host.

Pairing CDN With Cloud Storage

A CDN doesn’t cache every single piece of data一there will be times when a user’s request will be pulled directly from the origin store. Reliable, affordable, and performant origin storage becomes critical when the cache misses content. By pairing a CDN with origin storage in the cloud, companies can benefit from the elasticity and scalability of the cloud and the performance and speed of a CDN’s edge network.

Still wondering if a CDN is right for you? Let us know your questions in the comments.

The post Optimizing Website Performance With a CDN appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Next Steps for Chia, in Their Own Words

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/next-steps-for-chia-in-their-own-words/

A few weeks ago we published a post about why Backblaze chose not to farm Chia in our storage buffer. Our explanation was pretty simple: We agreed we weren’t in the game of currency speculation, so we just took the value and Netspace at the time and ran some math. In the end, it didn’t work for us, but our analysis isn’t the last word—we’re as curious as the next person as to what happens next with Chia.

The Chia Netspace slowed its exponential climb since we ran the post. At the time, it was increasing by about 33% each week. It’s now hovering between 31 and 33 exabytes, leaving us, and we presume a lot of other people, wondering what the future looks like for this cryptocurrency.

Jonmichael Hands, the VP of storage business development at Chia, reached out offering to discuss our post, and we figured he’d be a good guy to ask. So we gathered a few questions and sat down with him to learn more about what he sees on the horizon and what a wild ride it’s been so far.

Editor’s Note: This interview has been edited for length and clarity.

Q: What brought you to the Chia project?

I was involved in the beta about a year ago. It was right in the middle of COVID, so instead of traveling for work frequently, I built Chia farms in my garage and talked to a bunch of strangers on Keybase all night. At that time, the alpha version of the Chia plotter wrote six times more data than it writes today. I messaged the Chia president, saying, “You can’t release this software now. It’s going to obliterate consumer SSDs.” Prior to joining Chia, when I was at Intel, I did a lot of work on SSD endurance, so I helped Chia understand the software and how to optimize it for SSD endurance over the course of the year. Chia is an intersection of my primary expertise of storage, data centers, and cryptocurrencies—it was a natural fit, and I joined the team in May 2021.

Q: What was the outcome of that work to optimize the software?

Over the year, we got it down to about 1.3TB of writes, which is what it takes today. It’s still not a very friendly workload for consumer SSDs, and we definitely did not want people buying consumer SSDs and accidentally wearing them out for Chia. There has been a ton of community development in Chia plotters since the launch to further improve the performance and efficiency.

Q: That was a question we had, because Chia got a lot of backlash about people burning out consumer SSDs. What is your response to that criticism?

We did a lot of internal analysis to see if that was true, because we take it very seriously. So, how many SSDs are we burning out? I erred on the side of caution and assumed that 50% of the Netspace was farmed using consumer SSDs. I used the endurance calculator that I wrote for Chia on the wiki, which estimates how many plots you can write until the drive wears out. With 32 exabytes of Netspace, my math shows Chia wore out about 44,000 drives.

That seems high to me because I think consumers are smart. For the most part, I expect people have been doing their own research and buying the appropriate drives. We’ve also seen people plot on a consumer drive until it reaches a percentage of its useful life and then use it for something else. That’s a smart way to use new SSD drives—you get maximum use out of the hardware.

Companies are also offering plotting as a service. There are 50 or 60 providers who will just plot for you, likely using enterprise SSDs. So, I think 44,000 drives is a high estimate.

In 2021, there were 435 million SSD units shipped. With that many drives, how many SSD failures should we expect per year in total? We know the annualized failure rates, so it’s easy to figure out. Even in a best case scenario, I calculated there were probably 2.5 million SSD failures in 2021. If we created 44,000 extra failures, and that’s the high end, we’d only be contributing 1.5% of total failures.

Q: So, do you believe the e-waste argument is misplaced?

I’ve learned a lot about e-waste in the last few weeks. I talked to some e-waste facilities, and the amount of e-waste that SSDs create is small compared to other component parts, which is why SSD companies haven’t been attacked for e-waste before. They’re light and they don’t contain too many hazardous materials, comparatively. Most of them last five to 10 years as well. So we don’t believe there’s a large contribution from us in that area.

On the other hand, millions of hard drives get shredded each year, mostly by hyperscale data centers because end customers don’t want their data “getting out,” which is silly. I’ve talked to experts in the field, and I’ve done a lot of work myself on sanitization and secure erase and self-encrypting drives. With self-encrypting drives, you can basically instantly wipe the data and repurpose the drive for something else.

The data is erasure coded and encrypted before it hits the drive, then you can securely wipe the crypto key on the drive, making the data unreadable. Even then, tens of millions of drives are crushed every year, many of them for security reasons well before the end of their useful life. We think there’s an opportunity among those wasted drives.

Our team has a lot of ideas for how we could use Chia to accelerate markets for third-party recycled and renewed drives to get them back in the hands of Chia farmers and create a whole circular economy. If we’re successful in unlocking that storage, that will bring the cost of storage down. It will be a huge win for us and put us solidly on the right side of the e-waste issue.

Q: Did you expect the boom that happened earlier this summer and the spikes it created in the hard drive market?

Everybody at the company had their own Netspace model. My model was based off of the hard drive supply-demand sufficiency curve. If the market is undersupplied, prices go up. If the market’s vastly undersupplied, prices go up exponentially.

IDC says 1.2 zettabytes of hard drives are shipped every year, but the retail supply of HDDs is not very big. My model said when we hit 1% of the total hard drive supply for the year, prices are going to go up about 15%. If we hit 2% or 3%, prices will go up 30% to 40%. It turns out I was right that hard drive prices would go up, but I was wrong about the profitability.

It was the perfect black swan event. We launched the network on March 19 at 120 petabytes. Bitcoin was at an all-time high in April. We had this very low Netspace and this very high price. It created insane profitability for early farmers. Small farms were making $150,000 a day. People were treating it like the next Bitcoin, which we didn’t expect.

We went from 120 petabytes when we launched the network to 30 exabytes three months later. You can imagine I was a very popular guy in May. I was on the phone with analysts at Western Digital and Seagate almost every day talking about what was going to happen. When is it going to stop? Is it just going to keep growing forever?

It’s not shocking that it didn’t last long. At some point profitability gets impacted, and it starts slowing down.

Q: Where do you see hard drive demand going from here?

If the price doubles or triples in a very short amount of time, we might see a rush to go buy new hardware in the short term, but it will self-correct quickly. We’ll see Netspace acceleration in proportion. We predict the next wave of growth will come from smaller farmers and pools.

Bram [Cohen, the founder of Chia] hypothesized that underutilized storage space is ubiquitous. The majority of people aren’t using all of their hard drive space. IDC believes there’s about 500 exabytes of underutilized storage space sitting out in the world, so people have this equipment already. They don’t have to rush out and buy new hardware. That will largely be true for the next six months of growth. The first wave of growth was driven by new purchases. The next wave, and probably for the long term for Chia, will largely be driven by people who already have storage because the marginal cost is basically zero.

The demand for storage, overall, is increasing 20% to 30% every year, and hard drives are not getting 20% to 30% bigger every year. At some point, this inevitable squeeze was always going to happen where demand for storage exceeds supply. We want to figure out how we can grow sustainably and not impact that.

We have an interesting use case for old used drives, so we’re trying to figure out what the model is. There are certainly people who want to farm Chia on the enterprise scale, but it’s just not going to be cost-competitive to buy new drives long-term.

Q: Between the big enterprise farmers and the folks just happy to farm a few plots, do you have a preference?

Today, 37% of people are farming 10-50TB and 26% are farming 50-150TB. The remaining are big farmers. Technically, the smaller the farmer, the better. That means that we’re more decentralized. Our phase one was to build out the protocol and the framework for the most decentralized, secure blockchain in the world. In under three months, we’ve actually done that. One of the metrics of decentralization is how many full nodes you have. We’re approaching 350,000 full nodes. Just by pure metrics of decentralization we believe we are the most decentralized blockchain on the planet today.

Note: As of August 12, 2021, peak Bitcoin had 220K validators and now has about 65K. Chia’s peak was about 750K and it hovers around 450K.

In that respect, farming is actually one of the least interesting things we’re doing. It is a way to secure the network, and that’s been very novel. Today, if you want to launch a 51% attack, you have to buy 15 exabytes and get them up on the network. We think there’s definitely less than 100 data centers in the world that can host that many exabytes. Basically, the network has to be big enough where it can’t be attacked, and we think it’s there now. It’s very hard to attack a 30 exabyte network.

Q: We know you can’t speculate on future performance, but what does the future look like for Chia?

Our vision is to basically flip Ethereum within three years. Part of the business model will be having the support teams in place to help big financial institutions utilize Chia. We also think having a dedicated engineering team who are paid salaries is a good thing.

Our president thinks we’ll be known for Chialisp, which is the smart on-chain programming model. In the same way that everything’s a file in Linux, everything’s a coin in Chia. You can create what we call “Coloured Coins” or assets on the blockchain. So, you could tokenize a carbon credit. You could tokenize a Tesla stock. You can put those on the Chia blockchain and it just lives as a coin. Because it’s a coin, it natively swaps and is compatible with everything else on the blockchain. There’s no special software needed. Somebody could send it to another person with the standard Chia wallet because everything’s already integrated into the software. It’s very powerful for decentralized exchanges for some of this assetization and tokenization of different types of collateral on the blockchain.

Large financial institutions want to get involved with cryptocurrency, but there’s no play for them. All the financial institutions we’ve talked to have looked at Ethereum, but there are too many hacks. The code is too hard to audit. You need too much expertise to write it. And it consumes way too much energy. They’re not going to use a blockchain that’s not sustainable.

We are going to try to bridge that gap between traditional finance and the new world of cryptocurrency and decentralized finance. We think Chia can do that.

The post Next Steps for Chia, in Their Own Words appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Logpush to B2 Cloud Storage: A Q&A With Cloudflare

Post Syndicated from Elton Carneiro original https://www.backblaze.com/blog/logpush-to-b2-cloud-storage-a-qa-with-cloudflare/

As big believers in open ecosystems, interoperability, and just making life easier for developers, Backblaze and Cloudflare share a lot—which means we’re always excited to dig into new functionality they’re providing for devs. When we heard about their new Logpush tool, I reached out to Tanushree Sharma, the product manager on this project, to learn more about why they built it, how it works with Backblaze B2 Cloud Storage, and what comes next.

Q: Tell us more about the origins of Logpush. How does it fit into the Cloudflare ecosystem and what problems is it solving for?

A: Cloudflare provides security, performance, and reliability services to customers behind our network. We analyze the traffic going through our network to perform actions such as routing traffic to the nearest data center, protecting against attacks, and blocking malicious bots. As part of providing these services for customers, we generate logs for every request in our network. Logpush makes these logs available for Enterprise customers to get visibility into their traffic, quickly and at scale.

Q: Cloudflare already offers Logpull, what’s the difference between that and Logpush?

A: Logpull requires customers to make calls to our Logpull API and then set up a storage platform and/or analytics tools to view the logs. Increasingly, we were hearing repeated use cases where customers would want to integrate with common log storage and analytics products. We also frequently heard that customers want their logs in real time or as near as possible. We decided to create Logpush to solve both these problems. Rather than the need for customers to configure and maintain a system that makes repeated API calls for the data, with Logpush, customers configure where they would like to send their logs and we push them there directly on their behalf.

Q: What makes it compelling to Cloudflare customers? Are there specific use cases you can touch on? Any light you can shed on how a beta tester used it when you first announced it?

A: Logpush makes it very easy for customers to export data. They simply set up a job using the Logpush API or with the click of a few buttons in the Cloudflare dashboard. From there, customers can combine Cloudflare logs with those of other tooling in their infrastructure, such as a SIEM or marketing tracking tools.

This combined data is very useful not only for day-to-day monitoring, but also when conducting network forensics after an attack. For example, a typical L7 DDoS attack originates from a handful of IP addresses. Customers can use platform-wide analytics to understand the activity of IP addresses from both within the Cloudflare network and other applications in their infrastructure. Platform-wide analytics are very powerful in giving customers a holistic view of their entire system.

Q: What sparked the push to support more S3-compatible storage destinations for Logpush data?

A: S3-compatible storage is becoming an industry standard for cloud storage. With the increased adoption of S3-compatible storage, we thought it would be a great spot for us to create our own endpoint to be able to serve more platforms.

Q: This isn’t the first time Backblaze and Cloudflare have worked together. In the spirit of building a better internet, we’ve helped a number of companies reduce data transfer fees via the Bandwidth Alliance. How did this affect your decision to include B2 Cloud Storage as one of these storage destinations and how is it serving Cloudflare and its customers’ needs?

A: Cloudflare values open ecosystems in technology—we believe that customers should not have to be locked in to any single provider. We started the Bandwidth Alliance to reduce or eliminate egress fees, which gives customers the ability to select a set of options that work best for them. With Backblaze as a long time Bandwidth Alliance member, including B2 Cloud Storage out of the gate was a no-brainer!

This case study on why Nodecraft made the switch from AWS S3 to Backblaze B2 Cloud Storage is a great illustration of how the Bandwidth Alliance can benefit customers.

Q: What was the process of integrating B2 Cloud Storage within the Logpush framework?

A: We worked with the great folks at Backblaze to integrate B2 Cloud Storage as a storage destination. This process began by modeling out costs, which were greatly reduced due to discounted egress costs as a result of the Bandwidth Alliance. For the S3-compatible integration, our team leveraged the AWS Go SDK to integrate with BackBlaze. Once we had verified that the integration was working, we created an intuitive UI-based workflow for our customers to make it easier for them to create and configure Logpush jobs.

Q: What can we look forward to as Logpush matures? Anything exciting on the horizon that you’d like to share?

A: One of the big areas that our team is focusing on is data sovereignty. We want customers to have control over where their data is stored and processed. We’re also working on building out Logpush by adding data sets and giving customers more customization with their logs.

Stay tuned to our Logs blog for upcoming releases!

Q: As a Cloudflare customer, where do I begin if I want to utilize Logpush? Walk us through the setup process of selecting B2 Cloud Storage as a destination for my logs.

A: Once your B2 Cloud Storage destination is set up, you can create an S3-compatible endpoint on Cloudflare’s dashboard or API by specifying the B2 Cloud Storage destination information. For a detailed walkthrough on how to create a Logpush job, take a look at our documentation on enabling S3-compatible endpoints.

Many thanks Tanushree, we’re excited to see how customers put this new tool to work.

The post Logpush to B2 Cloud Storage: A Q&A With Cloudflare appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.