Tag Archives: Cloud Storage

Backing UP FreeNAS and TrueNAS to Backblaze B2

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-to-setup-freenas-cloud-storage/

FreeNAS and TrueNAS

Thanks to recent updates of FreeNAS and TrueNAS, backing up data to Backblaze B2 Cloud Storage is now available for both platforms. FreeNAS/TrueNAS v11.1 adds a feature called Cloud Sync, which lets you sync, move, or copy data to and from Backblaze B2.

What Are FreeNAS and TrueNAS?

FreeNAS and TrueNAS are two faces of a comprehensive NAS storage environment built on the FreeBSD OS and OpenZFS file system. FreeNAS is the open source and development platform, while TrueNAS is the supported and commercial product line offered by IXSystems.

FreeNAS logo

FreeNAS is for the DIY crowd. If you don’t mind working with bleeding-edge software and figuring out how to make your software and hardware work harmoniously, then FreeNAS could be a good choice for you.

TrueNAS logo

If you’re in a business or other environment with critical data, then a fully supported product like TrueNAS is likely the way you’ll want to go. IXsystems builds their TrueNAS commercial server appliances on the battle-tested, open source framework that FreeNAS and OpenZFS provide.

The software developed by the FreeNAS open source community forms the basis for both platforms, so we’ll talk specifically about FreeNAS in this post.

Working with FreeNAS

You can download FreeNAS directly from the open source project website, freenas.org. Once installed, FreeNAS is managed through a comprehensive web interface that is supplemented by a minimal shell console that handles essential administrative functions. The web interface supports storage pool configuration, user management, sharing configuration, and system maintenance.

FreeNAS web UI

FreeNAS supports Windows, macOS and Unix clients.

Syncing to B2 with FreeNAS

Files or directories can be synchronized to remote cloud storage providers, including B2, with the Cloud Sync feature.

Selecting Tasks ‣ Cloud Sync shows the screen below. This screen shows a single cloud sync called “backup-acctg” that “pushes” a file to cloud storage. The last run finished with a status of SUCCESS.

Existing cloud syncs can be run manually, edited, or deleted with the buttons that appear when a single cloud sync line is selected by clicking with the mouse.

FreeNAS Cloud Sync status

Cloud credentials must be defined before a cloud sync is created. One set of credentials can be used for more than one cloud sync. For example, a single set of credentials for Backblaze B2 can be used for separate cloud syncs that push different sets of files or directories.

A cloud storage area must also exist. With B2, these are called buckets and must be created before a sync task can be created.

After the credentials and receiving bucket have been created, a cloud sync task is created with Tasks ‣ Cloud Sync ‣ Add Cloud Sync. The Add Cloud Sync dialog is shown below.

FreeNAS Cloud Sync credentials

Cloud Sync Options

The table below shows the options for Cloud Sync.

Setting Value Type Description
Description string a descriptive name for this Cloud Sync
Direction string Push to send data to cloud storage, or Pull to pull data from the cloud storage
Provider drop-down
menu
select the cloud storage provider; the list of providers is defined by Cloud Credentials
Path browse
button
select the directories or files to be sent for Push syncs or the destinations for Pull syncs
Transfer Mode drop-down
menu
Sync (default): make files on destination system identical to those on the source; files removed from the source are removed from the destination (like rsync –delete)
Copy: copy files from the source to the destination, skipping files that are identical (like rsync)
Move: copy files from the source to the destination, deleting files from the source after the copy (like mv)
Minute slider or
minute selections
select Every N minutes and use the slider to choose a value, or select Each selected minute and choose specific minutes
Hour slider or
hour selections
select Every N hours and use the slider to choose a value, or select Each selected hour and choose specific hours
Day of month slider or
day of month
selections
select Every N days of month and use the slider to choose a value, or select Each selected day of month and choose specific days
Month checkboxes months when the Cloud Sync runs
Day of week checkboxes days of the week when the Cloud Sync runs
Enabled checkbox uncheck to temporarily disable this Cloud Sync

Take care when choosing a Direction. Most of the time, Push will be used to send data to the cloud storage. Pull retrieves data from cloud storage, but be careful: files retrieved from cloud storage will overwrite local files with the same names in the destination directory.

Provider is the name of the cloud storage provider. These providers are defined by entering credentials in Cloud Credentials.

After the Provider is chosen, a list of available cloud storage areas from that provider is shown. With B2, this is a drop-down with names of existing buckets.

Path is the path to the directories or files on the FreeNAS system. On Push jobs, this is the source location for files sent to cloud storage. On Pull jobs, the Path is where the retrieved files are written. Again, be cautious about the destination of Pull jobs to avoid overwriting existing files.

The Minute, Hour, Days of month, Months, and Days of week fields permit creating a flexible schedule of when the cloud synchronization takes place.

Finally, the Enabled field makes it possible temporarily disable a cloud sync job without deleting it.

FreeNAS Cloud Sync Example

This example shows a Push cloud sync which writes an accounting department backup file from the FreeNAS system to Backblaze B2 storage.

Before the new cloud sync was added, a bucket called “cloudsync-bucket” was created with the B2 web console for storing data from the FreeNAS system.

System ‣ Cloud Credentials ‣ Add Cloud Credential is used to enter the credentials for storage on a Backblaze B2 account. The credential is given the name B2, as shown in the image below:

FreeNAS Cloud Sync B2 credentials

Note on encryption: FreeNAS 11.1 Cloud Sync does not support client-side encryption of data and file names before syncing to the cloud, whether the destination is B2 or another public cloud provider. That capability will be available in FreeNAS v11.2, which is currently in beta.

Example: Adding Cloud Credentials

The local data to be sent to the cloud is a single file called accounting-backup.bin on the smb-storage dataset. A cloud sync job is created with Tasks ‣ Cloud Sync ‣ Add Cloud Sync.

The Description is set to “backup-acctg” to describe the job. This data is being sent to cloud storage, so this is a Push. The provider comes from the cloud credentials defined in the previous step, and the destination bucket “cloudsync-bucket” has been chosen.

The Path to the data file is selected.

The remaining fields are for setting a schedule. The default is to send the data to cloud storage once an hour, every day. The options provide great versatility in configuring when a cloud sync runs, anywhere from once a minute to once a year.

The Enabled field is checked by default, so this cloud sync will run at the next scheduled time.

The completed dialog is shown below:

FreeNAS Cloud Sync example

Dependable and Economical Disaster Recovery

In the event of an unexpected data-loss incident, the VMs, files, or other data stored in B2 from FreeNAS or TrueNAS are available for recovery. Having that data ready and available in B2 provides a dependable, easy, and cost effective offsite disaster recovery solution.

Are you using FreeNAS or TrueNAS? What tips do you have? Let us know in the comments.

The post Backing UP FreeNAS and TrueNAS to Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Minio as an S3 Gateway for Backblaze B2 Cloud Storage

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-to-use-minio-with-b2-cloud-storage/

Minio + B2

While there are many choices when it comes to object storage, the largest provider and the most recognized is usually Amazon’s S3. Amazon’s set of APIs to interact with their cloud storage, often just called “S3,” is frequently the first integration point for an application or service needing to send data to the cloud.

One of the more frequent questions we get is “how do I jump from S3 to B2 Cloud Storage?” We’ve previously highlighted many of the direct integrations that developers have built on B2: here’s a full list.

Another way to work with B2 is to use what is called a “cloud storage gateway.” A gateway is a service that acts as a translation layer between two services. In the case of Minio, it enables customers to take something that was integrated with the S3 API and immediately use it with B2.

Before going further, you might ask “why didn’t Backblaze just create an S3 compatible service?” We covered that topic in a recent blog post, Design Thinking: B2 APIs (& The Hidden Costs of S3 Compatibility). The short answer is that our architecture enables some useful differentiators for B2. Perhaps most importantly, it enables us to sustainably offer cloud storage at a ¼ of the price of S3, which you will really appreciate as your application or service grows.

However, there are situations when a customer is already using the S3 APIs in their infrastructure and want to understand all the options for switching to B2. For those customers, gateways like Minio can provide an elegant solution.

What is Minio?

Minio is an open source, multi-cloud object storage server and gateway with an Amazon S3 compatible API. Having an S3-compatible API means once configured, Minio acts as a gateway to B2 and will automatically and transparently put or get data into a Backblaze B2 account.

Backup, archive or other software that supports the S3 protocol can be configured to point at Minio. Minio internally translates all the incoming S3 API calls into equivalent B2 storage API calls, which means that all Minio buckets and objects are stored as native B2 buckets and objects. The S3 object layer is transparent to the applications that use the S3 API. This enables the simultaneous use of both Amazon S3 and B2 APIs without compromising any features.

Minio has become a popular solution, with over 113.7M+ Docker pulls. Minio implements the Amazon S3 v2/v4 API in the Minio client, AWS SDK, and in the AWS CLI.

Minio and B2

To try it out, we configured a MacBook Pro with a Docker container for the latest version of Minio. It was a straightforward matter to install the community version of Docker on our Mac and then install the container for Minio.

You can follow the instructions on GitHub for configuring Minio on your system.

In addition to using Minio with S3-compatible applications and creating new integrations using their SDK, one can use Minio’s Command-line Interface (CLI) and the Minio Browser to access storage resources.

Command-line Access to B2

We installed the Minio client (mc), which provides a modern CLI alternative to UNIX coreutils such as ls, cat, cp, mirror, diff, etc. It supports filesystems and Amazon S3 compatible cloud storage services. The Minio client is supported on Linux, Mac, and Windows platforms.

We used the command below to add the alias “myb2” to our host to make it easy to access our data.

mc config host add myb2 \
 http://localhost:9000 b2_account_id b2_application_key

Minio client commands

Once configured, you can use mc subcommands like ls, cp, mirror to manage your data.

Here’s the Minio client command to list our B2 buckets:

mc ls myb2

And the result:

Minio client

Browsing Your B2 Buckets

Minio Gateway comes with an embedded web based object browser that makes it easy to access your buckets and files on B2.

Minio browser

Minio is a Great Way to Try Out B2

Minio is designed to be straightforward to deploy and use. If you’re using an S3-compatible integration, or just want to try out Backblaze B2 using your existing knowledge of S3 APIs and commands, then Minio can be a quick solution to getting up and running with Backblaze B2 and taking advantage of the lower cost of B2 cloud storage.

The post Minio as an S3 Gateway for Backblaze B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Design Thinking: B2 APIs (& The Hidden Costs of S3 Compatibility)

Post Syndicated from Brian Wilson original https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility/

API Functions - Authorize, Download, List, Upload

When we get asked, “why did Backblaze make its own set of APIs for B2?” the question behind the question is most often “why didn’t Backblaze just implement an S3-compatible interface for B2?”

Either are totally reasonable questions to ask. The quick answer to either question? So our customers and partners can move faster while simultaneously enabling Backblaze to sustainably offer a cloud storage service that is ¼ of the price of S3.

But, before we jump all the way to the end, let me step you through our thinking.

The Four Major Functions of Cloud Storage APIs

Throughout cloud storage, S3 or otherwise, APIs are meant to mainly provide access to four major underlying functions:

  • Authorization — providing account/bucket/file access
  • Upload — Sending files to the cloud
  • Download — Data retrieval
  • List — Data checking / selection / comparison

The comparison between B2 and S3 on the List and Download functions is, candidly, not that interesting. Fundamentally, we ended up having similar approaches when solving those challenges. If the detail is of interest, I’m happy to get into that on a later post or answer questions in the comments below.

Backblaze and Amazon did take different approaches to how each service handles Authorization. The 5 step approach for S3 is well outlined here. B2’s architecture enables secure authorization in just 2 steps. My assertion is that a 2 step architecture is ~60% simpler than having a 5 step approach. To understand what we’re doing, I’d like to introduce the concept of Backblaze’s “Contract Architecture.”

The easiest way to understand B2’s Contract Architecture is to deep dive into how we handle the Upload process.

Uploads (Load Balancing vs Contract Architecture)

The interface to upload data into Amazon S3 is actually a bit simpler than Backblaze B2’s API. But it comes at a literal cost. It requires Amazon to have a massive and expensive choke point in their network: load balancers. When a customer tries to upload to S3, she is given a single upload URL to use. For instance, http://s3.amazonaws.com/<bucketname>. This is great for the customer as she can just start pushing data to the URL. But that requires Amazon to be able to take that data and then, in a second step behind the scenes, find available storage space and then push that data to that available location. The second step creates a choke point as it requires having high bandwidth load balancers. That, in turn, carries a significant customer implication; load balancers cost significant money.

When we were creating the B2 APIs, we faced a dilemma — do we go a simple but expensive route like S3? Or is there a way to remove significant cost even if it means introducing some slight complexity? We understood that there are perfectly great reasons to go either way — and there are customers at either end of this decision tree.

We realized the expense savings could be significant; we know load balancers well. We use them for our Download capabilities. They are expensive so, to run a sustainable service, we charge for downloads. That B2 download pricing is 1¢/GB while Amazon S3 starts at 9¢/GB is a subject we covered in a prior blog post.

Back to the B2 Upload function. With our existing knowledge of the “expensive” design, the next step was to understand the alternative path. We found a way to create significant savings by only introducing a modest level of complexity. Here’s how: When a “Client” wants to push data to the servers, it does not just start uploading data to a “well known URL” and have the SERVER figure out where to put the data. At the start, the Client contacts a “Dispatching Server” that has the job of knowing where there is optimally available space in a Backblaze data center.

The Dispatching Server (the API server answering the b2_get_upload_url call) tells the Client “there is space over on “Vault-8329.” This next step is our magic. Armed with the knowledge of the open vault, the Client ends its connection with the Dispatching Server and creates a brand new request DIRECTLY to Vault-8329 (calling b2_upload_file or b2_upload_part). No load balancers involved! This is guaranteed to scale infinitely for very little overhead cost. A side note is that the client can continue to directly call b2_upload_file repeatedly from now on (without asking the dispatch server ever again), up until it gets the response indicating that particular vault is full. In other words, this does NOT double the number of network requests.

The “Contract” concept emanates from a simple truth: all APIs are contracts between two entities (machines). Since the Client knows exactly where to go and exactly what authorization to bring with it, it can establish a secure “contract” with the Vault specified by the Dispatching Server.[1] The modest complexity only comes into play if Vault-8392 fills up, gets too busy, or goes offline. In that case, the Client will receive either a 500 or 503 error as notification that the contract has been terminated (in effect, it’s a firm message that says “stop uploading to Vault-8392, it doesn’t have room for more data”). When this happens, the Client is responsible to go BACK to the Dispatching Server, ask for a new vault, and retry the upload to a different vault. In the scenario where the Client has to go back to the Dispatching Server, the “two phase” process becomes more work for the Client versus S3’s singular “well known URL” architecture. Of course, this is all handled at the code level and is well documented. In effect, your code just needs to know that “if you receive a 500 block error, just retry.” It’s free, it’s easy, and it will work.

So while the Backblaze approach introduces some modest complexity, it can quickly and easily be reliably handled with code. Looking at S3’s approach, it is certainly simpler, but it results in three expensive consequences:

1) Expensive fixed costs. Amazon S3 has a single upload URL choke point that requires load balancers and extremely high bandwidth requirements. Backblaze’s architecture does not require moving data around internally; this lets us use commodity 10 GB/s connections that are affordable and will scale infinitely. Further, as discussed above, load balancing hardware is expensive. By removing it from our Upload system, we remove a significant cost center.

2) Expensive single URL availability issues. Amazon S3’s solution requires high availability of the single upload URL for massive amounts of data. The Contract concept from Backblaze works more reliably, but does add slight additional complexity when (rare) extra network round trips are needed.

3) Expensive, time consuming data copy needs (and “eventual consistency”). Amazon S3 requires the copying of massive amounts of data from one part of their network (the upload server) to wherever the data’s ultimate resting place will be. This is at the root of one of the biggest frustrations when dealing with S3: Amazon’s “eventual consistency.” It means that you can’t use your data until it has been pushed to all the places it needs to go. As the article notes, this is usually fast, but can be material amounts of time, anytime. The lack of predictability around access times is something anyone dealing with S3 is all too familiar with.

The B2 architecture offers what one could consider “strong consistency.” There are different definitions of that idea. Ours is that the client connects DIRECTLY with the correct final location for the data to land. Once our system has confirmed a write, the data has been written to enough places that we can guarantee that the data can be seen without delay.

Was our decision a good one? Customers will continue to vote on that, but it appears that the marginal complexity is more than offset by the fact that B2 is sustainable service offered at ¼ of S3’s price.

But Seriously, Why Aren’t You Just “S3 Compatible?”

The use of Object Storage requires some sort of interface. You can build it yourself by learning your vendor’s APIs or you can go through a third party integration. Regardless of what route you choose, somebody is becoming fluent in the vendor’s APIs. And beyond the difference in cost, there’s a reliability component.

This is a good time to clear up a common misconception. The S3 protocol is not a specification: it’s an API doc. Why does this matter? Because API docs leave many outcomes undocumented. For instance, when one uses S3’s list_files function, a developer canNOT know what is going to happen just by reading the API docs. Compounding this issue is the sheer scope of the S3 API; it is huge and expanding. Systems the purport to be “S3 compatible” are unlikely to implement the full API and have to document whatever subset they implement. Once that is done, they will have to work with integration partners and customers to communicate what subset they choose as “important.”

Ultimately, we have chosen to create robust documentation describing, among other things, the engineering specification (this input returns that output, here’s how B2 handles various error cases, etc).

With hundreds of integrations from third parties and hundreds of thousands of customers, it’s clear that our APIs have proven easy to implement. The reality is the first time anyone implements cloud storage into their application it can take weeks. The first move into the cloud can be particularly tough for legacy applications. But the marginal cloud implementation can be reliably completed in days, if not hours, if the documentation is clear and the APIs can be well understood. I’m pleased that we’ve been able to create a complete solution that is proving quite easy to use.

And it’s a big deal that B2 is free of the “load balancer problem.” It solves for a huge scaling issue. When we roll out new vaults in new data centers in new countries, the clients are contacting those vaults DIRECTLY (over whatever network path is shortest) and so there are fewer choke points in our architecture.

It all means that, over an infinite time horizon, our customers can rely on B2 as the most affordable, easiest to use cloud storage on the planet. And, at the end of the day, if we’re doing that, we’ve done the right thing.


[1] The Contract Architecture also explains how we got to a secure two step Authorization process. When you call the Dispatching Server, we run the authentication process and then give you a Vault for uploads and an Auth token. When you are establishing the contract with the Vault, the Auth token is required before any other operation can begin.

The post Design Thinking: B2 APIs (& The Hidden Costs of S3 Compatibility) appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s New In B2: Application Keys + Java SDK

Post Syndicated from Yev original https://www.backblaze.com/blog/b2-application-keys/

B2 Application Keys

It’s been a few months since our last “What’s New In B2” blog post, so we wanted to highlight some goings on and also introduce a new B2 feature!

Reintroducing: Java SDK + Compute Partnerships

We wanted to highlight the official Backblaze B2 Java SDK which can be found in our GitHub repo. The official Java SDK came out almost a year ago, but we’ve been steadily updating it since then with help from the community.

We’ve also announced some Compute Partnerships which give folks all the benefits of Backblaze B2’s low-cost cloud storage with the computing capabilities of Packet and ServerCentral. Backblaze B2 Cloud storage has directly connected to the compute providers, which offers customers low latency and free data transfers with B2 Cloud Storage.

Application Keys

Application keys give developers more control over who can do what and for how long with their B2 data. We’ve had the B2 application key documentation out for a while, and we’re ready to take off the “coming soon” tag.

row of keys

What are Application Keys?

In B2, the main application key has root access to everything and essentially controls every single operation that can be done inside B2. With the introduction of additional application keys, developers now have more flexibility.

Application keys are scoped by three things: 1) what operations the key can do, 2) what path inside of B2 that key can take, and 3) for how long it has the ability to do so. For example you might use a “read-only” key that only has access to one B2 bucket. You’d use that read-only key in situations where you don’t actually need to write things to the bucket, only read or “display” them. Or, you might use a “write-only” key which can only write to a specific folder inside of a bucket. All of this leads to cleaner code with segmented operations, essentially acting as firewalls should something go awry.

Application keys dialog screenshot

Use Cases for Application Keys

One example of how you’d use an application key is for a standard backing up operation. If you’re backing up an SQL database, you do not need to use your root level key to do so. Simply creating a key that can only upload to a specified folder is good enough.

Another example is that of a developer building apps inside of a client. That developer would want to restrict access and limit privileges of each client to specific buckets and folders — usually based on the client that is doing the operation. Using more locked-down application keys limits the possibility that one rogue client can affect the entire system.

A final case could be a Managed Service Provider (MSP) who creates and uses different application key for each client. That way, neither the client nor the MSP can accidentally access the files of another client. In addition, an MSP could have multiple application keys for a given client that define different levels of data access for given groups or individuals within the client’s organization.

We Hope You Like It

Are you one of the people that’s been waiting for application key support? We’d love to hear your use cases so sound off in the comments below with what you’re working on!

The post What’s New In B2: Application Keys + Java SDK appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Five Tips For Creating a Predictable Cloud Storage Budget

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/calculate-cost-cloud-storage/

Cloud Storage $$$, Transfer Rates $, Download Fees $$, Cute Piggy Bank $$$

Predicting your cloud storage cost should be easy. After all, there are only three cost dimensions: 1) storage (the rental for your slice of the cloud), 2) download (the fee to bring your data out of the cloud), and 3) transactions (charges for “stuff” you might do to your data inside the cloud). Yet, you probably know someone (you?) that was more than surprised when their cloud storage bill arrived. They have good company, as according to ZDNet, 37% of IT executives found their cloud storage costs to be unpredictable.

Here are five tips you can use when doing your due diligence on the cloud storage vendors you are considering. The goal is to create a cloud storage forecast that you can rely on each and every month.

Tip # 1 — Don’t Miscalculate Progressive (or is it Regressive?) Pricing Tiers

The words “Next” or “Over” on a pricing table are never a good thing.

Standard Storage Pricing Example

  • First 50 TB / Month $0.026 per GB
  • Next 450 TB / Month $0.025 per GB
  • Over 500 TB / Month $0.024 per GB

Those words mean there are tiers in the pricing table which, in this case, means you have to reach a specific level to get better pricing. You don’t get a retroactive discount — only the data above the minimum threshold enjoys the lower price.

The mistake sometimes made is calculating your entire storage cost based on the level for that amount of storage. For example, if you had 600 TB of storage, you could wrongly multiply as follows:

(600,000 x 0.024) = $14,400/month

When, in fact, you should do the following:

(50,000 x 0.026) + (450,000 x 0.025) + (100,000 x 0.024) = $15,150/month

That was just for storage. Make sure you consider the tiered pricing tables for data retrieval as well.

Tip # 2 — Don’t Choose the Wrong Service Level

Many cloud storage providers offer multiple levels of service. The idea is that you can trade service capabilities for cost. If you don’t need immediate access to your files or don’t want data replication or eleven 9s of durability, there is a choice for you. Besides giving away functionality, there’s a bigger problem. You have to know what you are going to do with your data to pick the right service because mistakes can get very expensive. For example:

  • You choose a low cost service tier that normally takes hours or days to restore your data. What can go wrong? You need some files back immediately and you end up paying 10-20 times the cost to expedite your restore.
  • You choose one level of service and decide you want to upload some data to a compute-based application or to another region — features not part of your current service. The good news? You can usually move the data. The bad news? You are charged a transfer fee to move the data within the same vendor’s infrastructure because you didn’t choose the right service tier when you started. These fees often eradicate any “savings” you had gotten from the lower priced tier.

Basically, if your needs change as they pertain to the data you have stored, you will pay more than you expect to get all that straightened out.

Tip # 3 — Don’t Pay for Deleted Files

Some cloud storage companies have a minimum amount of time you are charged for storage for each file uploaded. Typically this minimum period is between 30 and 90 days. You are charged even if you delete the file before the minimum period. For example (assuming a 90 day minimum period), if you upload a file today and delete the file tomorrow, you still have to pay for storing that deleted file for the next 88 days.

This “feature” often extends to files deleted due to versioning. Let’s say you want to keep three versions of each file, with older versions automatically deleted. If the now deleted versions were originally uploaded fewer than 90 days ago, you are charged for storing them for 90 days.

Using a typical backup scenario let’s say you are using a cloud storage service to store your files and your backup program is set to a 30 day retention. That means you will be perpetually paying for an additional 60 days worth of storage (for files that were pruned at 30 days). In other words, you would be paying for a 90 day retention period even though you only have 30 days worth of files.

Tip # 4 — Don’t Pay For Nothing

Some cloud storage vendors charge a minimum amount each month regardless of how little you have stored. For example, even if you only have 100 GB stored you get to pay like you have 1 TB (the minimum). This is the moral equivalent of a bank charging you a monthly fee if you don’t meet the minimum deposit amount.

Continuing on the theme of paying for nothing, be on the lookout for services that charge a minimum amount per each file stored regardless of how small the file is, including zero bytes. For example, some storage services have a minimum file size of 128K. Any files smaller than that are counted as being 128K for storage purposes. While the additional cost for even a couple of million zero-length files is trivial, you’re still being charged something for nothing.

Tip # 5 — Be Suspicious of the Fine Print

Misdirection is the art of getting you to focus on one thing so you don’t focus on other things going on. Practiced by magicians and some cloud storage companies, the idea is to get you to focus on certain features and capabilities without delving below the surface into the fine print.

Read the fine print and as you stroll through the multi-page pricing tables and linked pages of all of the rules that shape how you can use a given cloud storage service. Stop and ask, “what are they trying to hide?” If you find phrases like: “we reserve the right to limit your egress traffic,” or “new users gets free usage tier for 12 months,” or “provisioned requests should be used when you need a guarantee that your retrieval capacity will be available when you need it,” take heed.

How to Build a Predictable Cloud Storage Budget

As we noted previously, cloud storage costs are composed of three dimensions: storage, download and transactions. These are the cost drivers for cloud storage providers, and as such are the most straightforward way for service providers to pass on the cost of the service to its customers.

Let’s start with data storage as it is the easiest for a company to calculate. For a given month data storage cost is equal to:

Current data + new data – deleted data

Take that total and multiple by the monthly storage rate and you’ll get your monthly storage costs.

Computing download and transaction costs can be harder as these are variables you may have never calculated before, especially if you previously were using in-house or LTO-based storage. To help you out, below is a chart showing the breakdown of the revenue from Backblaze B2 Cloud Storage over the past 6 months.

% of Spend w/ B2

As you can see, download (2%) and transaction (3%) costs are, on average, minimal compared to storage costs. Unless you have reason to believe you are different, using these figures is a good proxy for your costs.

Let’s Give it a Try

Let’s start with 100 TB of original storage then add 10 TB each month and delete 5 TB each month. That’s 105 TB of storage for the first month. Backblaze has built a cloud storage calculator that computes costs for all of the major cloud storage providers. Using this calculator, we find that Amazon S3 would cost $2,205.50 to store this data for a month, while Backblaze B2 would charge just $525.10.

Using those numbers for storage and assuming that storage will be 95% of your total bill (as noted in the chart above), you get a total monthly cost of $2,321.05 for Amazon S3 and Backblaze B2 will be $552.74 a month.

The chart below provides the breakdown of the expected cost.

Backblaze B2 Amazon S3
Storage $525.10 $2,205.50
Download $11.06 $42.22
Transactions $16.58 $69.33
Totals: $552.74 $2,321.05

Of course each month you will add and delete storage, so you’ll have to account for that in your forecast. Using the cloud storage calculator noted above, you can get a good sense of your total cost over the budget forecasting period.

Finally, you can use the Backblaze B2 storage calculator to address potential use cases that are outside of your normal operation. For example, you delete a large project from your storage or you need to download a large amount of data. Running the calculator for these types of actions lets you obtain a solid estimate for their effect on your budget before they happen and lets you plan accordingly.

Creating a predictable cloud storage forecast is key to taking full advantage of all of the value in cloud storage. Organizations like Austin City Limits, Fellowship Church, and Panna Cooking were able to move to the cloud because they could reliably predict their cloud storage cost with Backblaze B2. You don’t have to let pricing tiers, hidden costs and fine print stop you. Backblaze makes predicting your cloud storage costs easy.

The post Five Tips For Creating a Predictable Cloud Storage Budget appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Hard Drive Stats for Q2 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q2-2018/

Backblaze Drive Stats Q2 2018

As of June 30, 2018 we had 100,254 spinning hard drives in Backblaze’s data centers. Of that number, there were 1,989 boot drives and 98,265 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take another look at comparing enterprise and consumer drives, get a first look at our 14 TB Toshiba drives, and introduce you to two new SMART stats. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Hard Drive Reliability Statistics for Q2 2018

Of the 98,265 hard drives we were monitoring at the end of Q2 2018, we removed from consideration those drives used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,184 hard drives. The table below covers just Q2 2018.

Backblaze Q2 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it just means that there were no drive failures of that model during Q2 2018.

The Annualized Failure Rate (AFR) for Q2 is just 1.08%, well below the Q1 2018 AFR and is our lowest quarterly AFR yet. That said, quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 81 drives (98,265 minus 98,184) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics. The use of 45 drives is historical in nature as that was the number of drives in our original Storage Pods.

Hard Drive Migrations Continue

The Q2 2018 Quarterly chart above was based on 98,184 hard drives. That was only 138 more hard drives than Q1 2018, which was based on 98,046 drives. Yet, we added nearly 40 PB of cloud storage during Q1. If we tried to store 40 PB on the 138 additional drives we added in Q2 then each new hard drive would have to store nearly 300 TB of data. While 300 TB hard drives would be awesome, the less awesome reality is that we replaced over 4,600 4 TB drives with nearly 4,800 12 TB drives.

The age of the 4 TB drives being replaced was between 3.5 and 4 years. In all cases their failure rates were 3% AFR (Annualized Failure Rate) or less, so why remove them? Simple, drive density — in this case three times the storage in the same cabinet space. Today, four years of service is the about the time where it makes financial sense to replace existing drives versus building out a new facility with new racks, etc. While there are several factors that go into the decision to migrate to higher density drives, keeping hard drives beyond that tipping point means we would be under utilizing valuable data center real estate.

Toshiba 14 TB drives and SMART Stats 23 and 24

In Q2 we added twenty 14 TB Toshiba hard drives (model: MG07ACA14TA) to our mix (not enough to be listed on our charts), but that will change as we have ordered an additional 1,200 drives to be deployed in Q3. These are 9-platter Helium filled drives which use their CMR/PRM (not SMR) recording technology.

In addition to being new drives for us, the Toshiba 14 TB drives also add two new SMART stat pairs: SMART 23 (Helium condition lower) and SMART 24 (Helium condition upper). Both attributes report normal and raw values, with the raw values currently being 0 and the normalized values being 100. As we learn more about these values, we’ll let you know. In the meantime, those of you who utilize our hard drive test data will need to update your data schema and upload scripts to read in the new attributes.

By the way, none of the 20 Toshiba 14 TB drives have failed after 3 weeks in service, but it is way too early to draw any conclusions.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models in operation as of June 30th, 2018. For each model, we compute its reliability starting from when it was first installed

Backblaze Lifetime Hard Drive Failure Rates

Notes and Observations

The combined AFR for all of the larger drives (8-, 10- and 12 TB) is only 1.02%. Many of these drives were deployed in the last year, so there is some volatility in the data, but we would expect this overall rate to decrease slightly over the next couple of years.

The overall failure rate for all hard drives in service is 1.80%. This is the lowest we have ever achieved, besting the previous low of 1.84% from Q1 2018.

Enterprise versus Consumer Hard Drives

In our Q3 2017 hard drive stats review, we compared two Seagate 8 TB hard drive models: one a consumer class drive (model: ST8000DM002) and the other an enterprise class drive (model: ST8000NM0055). Let’s compare the lifetime annualized failure rates from Q3 2017 and Q2 2018:

Lifetime AFR as of Q3 2017

    – 8 TB consumer drives: 1.1% annualized failure rate
    – 8 TB enterprise drives: 1.2% annualized failure rate

Lifetime AFR as of Q2 2018

    – 8 TB consumer drives: 1.03% annualized failure rate
    – 8 TB enterprise drives: 0.97% annualized failure rate

Hmmm, it looks like the enterprise drives are “winning.” But before we declare victory, let’s dig into a few details.

  1. Let’s start with drive days, the total number of days all the hard drives of a given model have been operational.- 8 TB consumer (model: ST8000DM002): 6,395,117 drive days
    – 8 TB enterprise (model: ST8000NM0055): 5,279,564 drive daysBoth models have a sufficient number of drive days and are reasonably close in their total number. No change to our conclusion do far.
  2. Next we’ll look at the confidence intervals for each model to see the range of possibilities within two deviations:- 8 TB consumer (model: ST8000DM002): Range 0.9% to 1.2%
    – 8 TB enterprise (model: ST8000NM0055): Range 0.8% to 1.1%The ranges are close, but multiple outcomes are possible. For example, the consumer drive could be as low as 0.9% and the enterprise drive could be as high as 1.1%. This doesn’t help or hurt our conclusion.
  3. Finally we’ll look at drive age — actually average drive age to be precise. This is the average time in operational service, in months, of all the drives of a given model. We’ll will start with the point in time when each drive reached approximately the current number of drives. That way the addition of new drives (not replacements) will have a minimal effect.
    Annualized Hard Drive Failure Rates by TimeWhen you constrain for drive count and average age, the AFR (annualized failure rate) of the enterprise drive is consistently below that of the consumer drive for these two drive models — albeit not by much.

Whether every enterprise model is better than every corresponding consumer model is unknown, but below are a few reasons you might choose one class of drive over another:

Enterprise Consumer
Longer Warranty: 5 vs. 2 years Lower price: up to 50% less
More features, i.e. PowerChoice technology Similar annualized failure rate as enterprise drives
Faster reads and writes Uses less power

Backblaze is known to be “thrifty” when purchasing drives. When you purchase 100 drives at a time or are faced with a drive crisis, it makes sense to purchase consumer drives. When you starting purchasing 100 petabytes’ worth of hard drives at a time, the price gap between enterprise and consumer drives shrinks to the point where the other factors come into play.

Hard Drives By the Numbers

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are over 100 million entries. The complete data set used to create the information presented in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting in the comments below or by contacting us directly.

The post Hard Drive Stats for Q2 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter

Post Syndicated from Brian Wilson original https://www.backblaze.com/blog/cloud-storage-durability/

Dials that go to 11

One of the most often talked about, but least understood, metrics in our industry is the concept of “data durability.” It is often talked about in that nearly everyone quotes some number of nines, and it is least understood in that no one tells you how they actually computed the number or what they actually mean by it.

It strikes us as odd that so much of the world depends on the concept of RAID and Encodings, but the calculations are not standard or agreed upon. Different web calculators allow you to input some variables but not the correct or most important variables. In almost all cases, they obscure the math behind how they spit out their final numbers. There are a few research papers, but hardly a consensus. There just doesn’t seem to be an agreed upon standard calculation of how many “9s” are in the final result. We’d like to change that.

In the same spirit of transparency that leads us to publish our hard drive performance stats, open source our Reed-Solomon Erasure Code, and generally try to share as much of our underlying architecture as practical, we’d like to share our calculations for the durability of data stored with us.

We are doing this for two reasons:

  1. We believe that sharing, where practical, furthers innovation in the community.
  2. Transparency breeds trust. We’re in the business of asking customers to trust us with their data. It seems reasonable to demonstrate why we’re worthy of your trust.

11 Nines Data Durability for Backblaze B2 Cloud Storage

At the end of the day, the technical answer is “11 nines.” That’s 99.999999999%. Conceptually, if you store 1 million objects in B2 for 10 million years, you would expect to lose 1 file. There’s a higher likelihood of an asteroid destroying Earth within a million years, but that is something we’ll get to at the end of the post.

How to Calculate Data Durability

Amazon’s CTO put forth the X million objects over Y million years metaphor in a blog post. That’s a good way to think about it — customers want to know that their data is safe and secure.

When you send us a file or object, it is actually broken up into 20 pieces (“shards”). The shards overlap so that the original file can be reconstructed from any combination of any 17 of the original 20 pieces. We then store those pieces on different drives that sit in different physical places (we call those 20 drives a “tome”) to minimize the possibility of data loss. When one drive fails, we have processes in place to “rebuild” the data for that drive. So, to lose a file, we have to have four drives fail before we had a chance to rebuild the first one.

The math on calculating all this is extremely complex. Making it even more interesting, we debate internally whether the proper calculation methodology is to use the Poisson distribution (the probability of continuous events occurring) or Binomial (the probability of discrete events). We spent a shocking amount of time debating this and believe that both arguments have merits. Rather than posit one absolute truth, we decided to publish the results of both calculations (spoiler alert: Either methodology tells you that your files are safe with Backblaze).

The math is difficult to follow unless you have some facility with advanced statistics. We’ll forgive you if you want to skip the sections entirely, just click here.

Poisson Distribution

When dealing with the probability of X number of events occuring in a fixed period of time, a good place to start is the Poisson distribution.[1]

For inputs, we use the following assumptions:[2]

The average rebuild time to achieve complete parity for any given B2 object with a failed drive is 6.5 days.
A given file uploaded to Backblaze is split into 20 “shards” or pieces. The shards are distributed across multiple drives in a way that any drive can fail and the file is fully recoverable — a file is not lost unless four drives were to fail in a given vault before they could be “rebuilt.” This rebuild is enabled through our Reed-Solomon Erasure Code. Once one drive fails, the other shards are used to “rebuild” the data on the original drive (creating, for all practical purposes, an exact clone of the original drive).

The rule of thumb we use is that for every 1 TB needed to be rebuilt, one should allow 1 day. So a 12 TB drive would, on average, be rebuilt after 12 days. In practice, that number may vary based on a variety of factors, including, but not limited to, our team attempting to clone the failed drive before starting the rebuild process. Based on whatever else may be happening at a given time, a single failed drive may also not be addressed for one day. (Remember, a single drive failure has a dramatically different implication than a hypothetical third drive failure within a given vault — different situations would call for different operational protocols.) For the purposes of this calculation, and a desire to provide simplicity where possible, we assumed an average of a one day lag time before we start the rebuild.

The annualized failure rate of a drive is 0.81%.
For the trailing 60 days while we were writing this post, our average drive failure rate was 0.81%. Long time readers of our blog will also note that hard drive failure rates in our environment have fluctuated over time. But we also factor in the availability of data recovery services including, but not limited to, those offered by our friends at DriveSavers. We estimate a 50% likelihood of full (100%) data recovery from a failed drive that’s sent to DriveSavers. That cuts the effective failure rate in half to 0.41%.

For our Poisson calculation, we use this formula:

Poisson Calculation

The values for the variables are:

  • Annual average failure rate = 0.0041 per drive per year on average
  • Interval or “period” = 156 hours (6.5 days)
  • Lambda = ((0.0041 * 20)/((365*24)/156)) =0.00146027397 for every “interval or period”
  • e = 2.7182818284
  • k = 4 (we want to know the probability of 4 “events” during this 156 hour interval)

Here’s what it looks like:

Poisson Calculation

Poisson calculation enumerated

If you’re following along at home, type this into an infinite precision calculator:[3]

(2.7182818284^(-0.00146027397)) * (((0.00146027397)^4)/(4*3*2*1))

The sub result for 4 simultaneous drive failures in 156 hours = 1.89187284e-13. That means the probability of it NOT happening in 156 hours is (1 – 1.89187284e-13) which equals 0.999999999999810812715 (12 nines).

But there’s a “gotcha.” You actually should calculate the probability of it not happening by considering that there are 56 “156 hour intervals” in a given year. That calculation is:

= (1 – 1.89187284e-13)^56
= (0.999999999999810812715)^56
= 0.99999999999 (11 “nines”)

Yes, while this post claims that Backblaze achieves 11 nines worth of durability, at least one of our internal calculations comes out to 12 nines. Why go with 11 and not 12?

  1. There are different methodologies to calculate the number, so we are publishing the most conservative result.
  2. It doesn’t matter (skip to the end of this post for more on that).

Binomial Distribution

For those interested in getting into the full detail of this calculation, we made a public repository on GitHub. It’s our view on how to calculate the durability of data stored with erasure coding, assuming a failure rate for each shard, and independent failures for each shard.

First, some naming. We will use these names in the calculations:

  • S is the total number of shards (data plus parity)
  • R is the repair time for a shard in days: how long it takes to replace a shard after it fails
  • A is the annual failure rate of one shard
  • F is the failure rate of a shard in R days
  • P is the probability of a shard failing at least once in R days
  • D is the durability of data over R days: not too many shards are lost

With erasure coding, your data remains intact as long as you don’t lose more shards than there are parity shards. If you do lose more, there is no way to recover the data.

One of the assumptions we make is that it takes R days to repair a failed shard. Let’s start with a simpler problem and look at the data durability over a period of R days. For a data loss to happen in this time period, P+1 shards (or more) would have to fail.

We will use A to denote the annual failure rate of individual shards. Over one year, the chances that a shard will fail is evenly distributed over all of the R-day periods in the year. We will use F to denote the failure rate of one shard in an R-day period:

The probability of failure of a single shard in R days is approximately F, when F is small. The exact value, from the Poisson distribution is:

Given the probability of one shard failing, we can use the binomial distribution’s probability mass function to calculate the probability of exactly n of the S shards failing:

We also lose data if more than n shards fail in the period. To include those, we can sum the above formula for n through S shards, to get the probability of data loss in R days:

The durability in each period is inverse of that:

Durability over the full year happens when there’s durability in all of the periods, which is the product of probabilities:

And that’s the answer!

 

For the full calculation and explanation, including our Python code, please visit the GitHub repo:

https://github.com/Backblaze/erasure-coding-durability/blob/master/calculation.ipynb

We’d Like to Assure You It Doesn’t Matter

For anyone in the data business, durability and reliability are very serious issues. Customers want to store their data and know it’s there to be accessed when it’s needed. Any relevant system in our industry must be designed with a number of protocols in place to insure the safety of our customer’s data.

But at some point, we all start sounding like the guitar player for Spinal Tap. Yes, our nines go to 11. Where is that point? That’s open for debate. But somewhere around the 8th nine we start moving from practical to purely academic.[4] Why? Because at these probability levels, it’s far more likely that:

  • An armed conflict takes out data center(s).
  • Earthquakes / floods / pests / or other events known as “Acts of God” destroy multiple data centers.
  • There’s a prolonged billing problem and your account data is deleted.

That last one is particularly interesting. Any vendor selling cloud storage relies on billing its customers. If a customer stops paying, after some grace period, the vendor will delete the data to free up space for a paying customer.

Some customers pay by credit card. We don’t have the math behind it, but we believe there’s a greater than 1 in a million chance that the following events could occur:

  • You change your credit card provider. The credit card on file is invalid when the vendor tries to bill it.
  • Your email service provider thinks billing emails are SPAM. You don’t see the emails coming from your vendor saying there is a problem.
  • You do not answer phone calls from numbers you do not recognize; Customer Support is trying to call you from a blocked number; they are trying to leave voicemails but the mailbox is full.

If all those things are true, it’s possible that your data gets deleted simply because the system is operating as designed.

What’s the Point? All Hard Drives Will Fail. Design for Failure.

Durability should NOT be taken lightly. Backblaze, like all the other serious cloud providers, dedicates valuable time and resources to continuously improving durability. As shown above we have 11 nines of durability. More importantly, we continually invest in our systems, processes, and people to make improvements.

Any vendor that takes the obligation to protect customer data seriously is deep into “designing for failure.” That requires building fault tolerant systems and processes that help mitigate the impact of failure scenarios. All hard drives will fail. That is a fact. So the question really is “how have you designed your system so it mitigates failures of any given piece?”

Backblaze’s architecture uses erasure code to reliably get any given file stored in multiple physical locations (mitigating against specific types of failures like a faulty power strip). Backblaze’s business model is profitable and self-sustaining and provides us with the resources and wherewithal to make the right decisions. We also make the decision to do things like publish our hard drive failure rates, our cost structure, and this post. We also have a number of ridiculously intelligent, hard working people dedicated towards improving our systems. Why? Because the obligation around protecting your data goes far beyond the academic calculation of “durability” as defined by hard drive failure rates.

Eleven years in and counting, with over 600 petabytes of data stored from customers across 160 countries, and well over 30 billion files restored, we confidently state that our system has scaled successfully and is reliable. The numbers bear it out and the experiences of our customers prove it.

And that’s the bottom line for data durability.


[1] One aspect of the Poisson distribution is that it assumes that the probability of failure is constant over time. Hard drives, in Backblaze’s environment, exhibit a “bathtub curve” for failures (higher likelihood of failure when they are first turned on and at the forecasted end of usable life). While we ran various internal models to account for that, it didn’t have a practical effect on the calculation. In addition, there’s some debate to be had about what the appropriate model is — at Backblaze, hard drives are thoroughly tested before putting them into our production system (affecting the theoretical extreme front end of the bathtub curve). Given all that, for the sake of a semblance of simplicity, we present a straightforward Poisson calculation.

[2] This is an area where we should emphasize the conceptual nature of this exercise. System design and reality can diverge.

[3] The complexity will break most standard calculators.

[4] Previously, Backblaze published its durability to be 8 nines. At the time, it reflected what we knew about drive failure rates and recovery times. Today, the failure rates are favorable. In addition, we’ve worked on and continue to innovate solutions around speeding up drive replacement time.

The post Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

B2 Quickstart Guide

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/b2-quickstart-guide/

B2 Quickstart Guide
If you’re ready to get started with B2 Cloud Storage, these tips and resources will quickly get you up and running with B2.

What can you do with B2, our low-cost, fast, and easy object cloud storage service?

  • Creative professionals can archive their valuable photos and videos
  • Backup, archive, or sync servers and NAS devices
  • Replace tape backup systems (LTO)
  • Host and serve text, photos, videos, and other web content
  • Build apps that demand affordable cloud storage

B2 cloud storage logo

If you haven’t created an account yet, you can do that here.

Just For Newbies

Are you a beginner to B2? Here’s just what you need to get started.

Saving photos to the cloud

Developer or Old Hat at the Cloud?

If you’re a developer or more experienced with cloud storage, here are some resources just for you.

diagram of how to save files to the cloud

Have a NAS You’d Like to Link to the Cloud?

Would you like to get more out of your Synology, QNAP, or Morro Data NAS? Take a look at these posts that enable you to easily extend your local data storage to the cloud.

diagram of NAS to cloud backup

Looking for an Integration to Super Charge the Cloud?

We’ve blogged about the powerful integrations that work with B2 to provide solutions for a wide-range of backup, archiving, media management, and computing needs. The links below are just a start. You can visit our Integrations page or search our blog for the integration that interests you.

diagram of cloud computing integrations

We hope that gets you started. There’s plenty more about B2 on our blog in the “Cloud Storage” category and in our Knowledge Base.

Didn’t find what you need to get started with B2? Ask away in the comments.

The post B2 Quickstart Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Five Cool Multi-Cloud Backup Solutions You Should Consider

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/multi-cloud-backup-solutions/

Multi-Cloud backup

We came across Kevin Soltow’s blog, VMWare Blog, and invited him to contribute a post on a topic that’s getting a lot of attention these days: using multiple public clouds for storing data to increase data redundancy and also to save money. We hope you enjoy the post.
Kevin Soltow
Kevin lives in Zumikon, Switzerland where he works as a Storage Consultant specializing in VMware technologies and storage virtualization. He gradudated from Swiss Federal Institute of Technology in Zurich, and now works on implementing disaster recovery and virtual SAN solutions.

Nowadays, it’s hard to find a company without a backup or DR strategy in place. An organization’s data has become one of its most important assets, so making sure it remains safe and available are key priorities. But does it really matter where your backups are stored? If you follow the “3-2-1” backup rule, you know the answer. You should have at least three copies of your data, two of which are local but on different media, and at least one copy is offsite. That all sounds reasonable.

What about the devices to store your backup data?

Tapes — Tapes were first, had large capacity, the ability to keep data for a long time, but unfortunately they were slow. Historically, they have been less expensive than disks.

Disks — Disks have great capacity, are more durable and faster than tapes, and are improving rapidly in capacity, speed, and cost-per-unit-stored.

In a previous post, Looking for the most affordable cloud storage? AWS vs Azure vs Backblaze B2, I looked at the cost of public cloud storage. With reasonably-priced cloud services available, cloud can be that perfect offsite option for keeping your data safe.

No cloud storage provider can guarantee 100% accessibility and security. Sure, they get close to this number, with claims of 99-point-some-number-of-nines durability, but an unexpected power outage or disaster could knock out their services for minutes, hours, or longer. This happened last year to Amazon S3 when the service suffered from a service disruption. This year, S3 was down due to a power outage. Fortunately, Amazon did not lose their customers’ data. The key words in the 3-2-1 backup rule are at least one copy offsite. More is always better.

Keeping data in multiple clouds provides a clear advantage for reliability, but it also can provide a cost savings, as well. Using multiple cloud providers can simultaneously provide geographically dispersed backups while taking advantage of the lower storage costs available from competitively-priced cloud providers.

In this post, we take a closer look at solutions that support multiple public clouds and allow you to keep several backup copies in different and dispersed clouds.

The Backup Process

The backup process is illustrated in the figure below:

diagram of the backup process from local to cloud

Some solutions create backups and move them to the repository. Data is kept there for a while and then shifted to the cloud where it stays as long as needed.

In this post, I discuss the dedicated software serving as a “data router,” in other words, the software involved in the process of moving data from some local repository to one or more public clouds.

software to send backups to cloud diagram

Let’s have a look at the options we have to achieve this.

1 — Rclone

When I considered solutions that let you back up your data to several clouds, Rclone and CloudBerry were the first solutions that popped into my head. Rclone acts as a data mover synchronizing your local repository with cloud-based object storage. You basically create a backup using something else (e.g. Veeam Backup & Replication), allocate it on-premises, and the solution sends it to several clouds. First developed for Linux, Rclone has a command-line interface to sync files and directories between clouds.

OS Compatibility

The solution can be run on all OS’s using the command-line interface.

Cloud Support

The solution works with most popular public cloud storage platforms, such as Backblaze B2, Microsoft Azure, Amazon S3 and Glacier, Google Cloud Platform, etc.

Feature set

Rclone commands work wonderfully on whatever remote storage system, be it public cloud storage or just your backup server located somewhere else. It also can send data to multiple places simultaneously, but bi-directional sync does not work yet. In other words, changes you make to your files in the cloud do not affect their local copies. The synchronization process is incremental on the file-by-file basis. It should also be noted that Rclone preserves timestamps on files, which helps when searching for the right backup.

This solution provides two options for moving data to the cloud: sync and copy. The first one, sync, allows moving the backups to the cloud automatically as soon as they appear in the specified local directory. The second mode, copy, as expected from its name, allows only copying data from on-premises to the cloud. Deleting your files locally won’t affect the ones stored in the cloud. There’s also an option to verify hash equality.

Learn more about Rclone: https://rclone.org/

2 — CloudBerry Backup

CloudBerry Backup is built from the self-titled backup technology developed for service providers and enterprise IT departments. It is a cross-platform solution. Note that it’s full-fledged backup software, allowing you to not only move backups to the cloud but also create them.

OS compatibility

CloudBerry is a cross-platform solution.

Cloud Support

So far, the solution can talk to Backblaze B2, Microsoft Azure, Amazon S3 and Glacier, Google Cloud Platform, and more.

Feature set

Designed for large IT departments and managed service providers, CloudBerry Backup provides some features that make the solution really handy for the big guys. It offers the opportunity for client customization up to and including the complete rebranding of the solution.

Let’s look at the backup side of this thing. The solution allows backing up files and directories manually. If you prefer, you can sync the selected directory to the root of the bucket. CloudBerry Backup also can schedule backups. Now, you won’t miss them! Another cool thing is backup jobs management and monitoring. Thanks to this feature you are always aware of backup processes on the client machines.

The solution offers AES 256-bit end-to-end encryption to ensure your data safety.

Learn more about CloudBerry Backup: https://www.cloudberrylab.com/managed-backup.aspx

Read Backblaze’s blog post, How To Back Up Your NAS with B2 and CloudBerry.

3 — StarWind VTL

Some organizations still use Virtual Tape Library (VTL), but want to sync their tape objects to the cloud as well.

OS compatibility

This product is available only for Windows.

Cloud Support

So far, StarWind VTL can talk to popular cloud storage platforms like Backblaze B2, AWS S3 and Glacier, and Microsoft Azure.

Feature set

The product has many features for anyone who wants to back up to the cloud. First, it allows sending data to the cloud’s respective tier with their subsequent automatic de-staging. This automation makes StarWind VTL really cool. Second, the product supports both on-premises and public cloud object storage. Third, StarWind VTL supports deduplication and compression, making storage utilization more efficient. This solution also allows client-side encryption.

StarWind supports standard “LTO” (Linear Tape-Open) protocols. This appeals to organizations that have LTO in place since it allows adoption of more scalable, cost efficient cloud storage without having to update the internal backup infrastructure.

All operations in the StarWind VTL environment are done via the Management Console and Web-Console, the web-interface that makes VTL compatible with all browsers.

Learn more about StarWind Virtual Tape Library: https://www.starwindsoftware.com/starwind-virtual-tape-library

Also, see Backblaze’s post on StarWind VTL: Connect Veeam to the B2 Cloud: Episode 2 — Using StarWind VTL

4 — Duplicati

Duplicati is designed for online backups from scratch. It also can send your data directly to multiple clouds or use local storage as a backend.

OS compatibility

It is free and compatible with Windows, macOS, and Linux.

Cloud Support

So far, the solution talks to Backblaze B2, Amazon S3, Mega, Google Cloud Storage, and Microsoft Azure.

Feature set

Duplicati has some awesome features. First, the solution is free. Notably, its team does not restrict using this software for free even for commercial purposes. Second, Duplicati employs decent encryption, compression, and deduplication, making your storage more efficient and safe. Third, the solution adds timestamps to your files, so you can easily find the specific backup. Fourth, the backup scheduler helps make users’ lives simpler. Now, you won’t miss the backup time!

What makes this piece of software special and really handy is backup content verification. Indeed, you never know whether the backup works out until you literally back up from it. Thanks to this feature, you can pinpoint any problems before it is too late.

Duplicati is managed via a web interface, making it possible to access from anywhere and any platform.

Learn more about Duplicati: https://www.duplicati.com/.

Read Backblaze’s post on Duplicati: Duplicati, a Free, Cross-Platform Client for B2 Cloud Storage.

5 — Duplicacy

Duplicacy supports popular public cloud storage services. Apart from the cloud, it can use SFTP servers and NAS boxes as backends.

OS compatibility

The solution is compatible with Windows, Mac OS X, and Linux.

Cloud Support

Duplicacy can offload data to Backblaze B2, Amazon S3, Google Cloud Storage, Microsoft Azure, and more.

Feature set

Duplicacy not only routes your backups to the cloud but also creates them. Note that each backup created by this solution is incremental. Each of them is treated as a full snapshot, allowing simpler restoration, deletion, and backup transition between storage sites. Duplicacy sends your files to multiple cloud storages and uses strong client-side encryption. Another cool thing about this solution is its ability to provide multiple clients with simultaneous access to the same storage.

Duplicacy has a comprehensive GUI that features one-page configuration for quick backup scheduling and managing retention policies. If you are a command-line interface fan, you can manage Duplicacy via the command line.

Learn more about Duplicacy: https://duplicacy.com/

Read Backblaze’s Knowledge Base article, How to upload files to B2 with Duplicacy.

So, Should You Store Your Data in More Than One Cloud?

Undoubtedly, keeping a copy of your data in the public cloud is a great idea and enables you to comply with the 3-2-1 backup rule. By going beyond that and adopting a multi-cloud strategy, it is possible to save money and also gain additional data redundancy and security by having data in more than one public cloud service.

As I’ve covered in this post, there are a number of wonderful backup solutions that can talk to multiple public cloud storage services. I hope this article proves useful to you and you’ll consider employing one of the reviewed solutions in your backup infrastructure.

Kevin Soltow

The post Five Cool Multi-Cloud Backup Solutions You Should Consider appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

What’s the Diff: VMs vs Containers

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/vm-vs-containers/

What's the Diff: Containers vs VMs

Both VMs and containers can help get the most out of available computer hardware and software resources. Containers are the new kids on the block, but VMs have been, and continue to be, tremendously popular in data centers of all sizes.

If you’re looking for the best solution for running your own services in the cloud, you need to understand these virtualization technologies, how they compare to each other, and what are the best uses for each. Here’s our quick introduction.

Basic Definitions — VMs and Containers

What are VMs?

A virtual machine (VM) is an emulation of a computer system. Put simply, it makes it possible to run what appear to be many separate computers on hardware that is actually one computer.

The operating systems (“OS”) and their applications share hardware resources from a single host server, or from a pool of host servers. Each VM requires its own underlying OS, and the hardware is virtualized. A hypervisor, or a virtual machine monitor, is software, firmware, or hardware that creates and runs VMs. It sits between the hardware and the virtual machine and is necessary to virtualize the server.

Since the advent of affordable virtualization technology and cloud computing services, IT departments large and small have embraced virtual machines (VMs) as a way to lower costs and increase efficiencies.

Virtual Machine System Architecture Diagram

VMs, however, can take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles. That’s still economical compared to running separate actual computers, but for some applications it can be overkill.

That led to the development of containers.

Benefits of VMs

  • All OS resources available to apps
  • Established management tools
  • Established security tools
  • Better known security controls
Popular VM Providers

What are Containers?

With containers, instead of virtualizing the underlying computer like a virtual machine (VM), just the OS is virtualized.

Containers sit on top of a physical server and its host OS — typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too. Shared components are read-only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code, and means that a server can run multiple workloads with a single operating system installation. Containers are thus exceptionally “light” — they are only megabytes in size and take just seconds to start. In contrast, VMs take minutes to run and are an order of magnitude larger than an equivalent container.

In contrast to VMs, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. What this means in practice is you can put two to three times as many as applications on a single server with containers than you can with a VM. In addition, with containers you can create a portable, consistent operating environment for development, testing, and deployment.

Containers System Architecture Diagram

Types of Containers

Linux Containers (LXC) — The original Linux container technology is Linux Containers, commonly known as LXC. LXC is a Linux operating system level virtualization method for running multiple isolated Linux systems on a single host.

Docker — Docker started as a project to build single-application LXC containers, introducing several changes to LXC that make containers more portable and flexible to use. It later morphed into its own container runtime environment. At a high level, Docker is a Linux utility that can efficiently create, ship, and run containers.

Benefits of Containers

  • Reduced IT management resources
  • Reduced size of snapshots
  • Quicker spinning up apps
  • Reduced & simplified security updates
  • Less code to transfer, migrate, upload workloads
Popular Container Providers

Uses for VMs vs Uses for Containers

Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs, but there are some general rules of thumb.

  • VMs are a better choice for running apps that require all of the operating system’s resources and functionality, when you need to run multiple applications on servers, or have a wide variety of operating systems to manage.
  • Containers are a better choice when your biggest priority is maximizing the number of applications running on a minimal number of servers.
What’s the Diff: VMs vs. Containers
VMs Containers
Heavyweight Lightweight
Limited performance Native performance
Each VM runs in its own OS All containers share the host OS
Host OS can be different than the guest OS Host OS and container OS are the same
Startup time in minutes Startup time in milliseconds
Hardware-level virtualization OS virtualization
Allocates required memory Requires less memory space
Fully isolated and hence more secure Process-level isolation and hence less secure

For most, the ideal setup is likely to include both. With the current state of virtualization technology, the flexibility of VMs and the minimal resource requirements of containers work together to provide environments with maximum functionality.

If your organization is running a large number of instances of the same operating system, then you should look into whether containers are a good fit. They just might save you significant time and money over VMs.

Are you Using VMs, Containers, or Both?

We will explore this topic in greater depth in subsequent posts. If you are using VMs or containers, we’d love to hear from you about what you’re using and how you’re using them.

The post What’s the Diff: VMs vs Containers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/panna-cooking-creates-perfect-storage-recipe/

Panna Cooking custard dessert with strawberries

Panna Cooking is the smart home cook’s go-to resource for learning to cook, video recipes, and shopping lists, all delivered by 40+ of the world’s best chefs. Video is the primary method Panna uses to communicate with their customers. Joshua Stenseth is a full-time editor and part-time IT consultant in charge of wrangling the video content Panna creates.

Like many organizations, digital media archive storage wasn’t top of mind over the years at Panna Cooking. Over time, more and more media projects were archived to various external hard drives and dutifully stored in the archive closet. The external drive archive solution was inexpensive and was easy to administer for Joshua, until…

Joshua stared at the request from the chef. She wanted to update a recipe video from a year ago to include in her next weekly video. The edits being requested were the easy part as the new footage was ready to go. The trouble was locating the old digital video files. Over time, Panna had built up a digital video archive that resided on over 100 external hard drives scattered around the office. The digital files that Joshua needed were on one of those drives.

Panna Cooking, like many growing organizations, learned that the easy-to-do tasks, like using external hard drives for data archiving, don’t scale very well. This is especially true for media content such as video and photographs. It is easy to get overwhelmed.

Panna Cooking dessert

Joshua was given the task of ensuring all the content created by Panna was economically stored, readily available, and secured off site. The Panna Cooking case study details how Joshua was able to consolidate their existing scattered archive by employing the Hybrid Cloud Storage package from 45 Drives, a flexible, highly affordable, media archiving solution consisting of a 45 Drives Storinator storage server, Rclone, Duplicity, and B2 Cloud Storage.

The 45 Drives’ innovative Hybrid Cloud Storage package and their partnership with Backblaze B2 Cloud Storage was the perfect solution. The Hybrid Cloud Storage package installs on the Storinator system and utilizes Rclone or Duplicity to back up or sync files to the B2 cloud. This gave Panna a fully operational local storage system that sends changes automatically to the B2 cloud. For Joshua and his fellow editors, the Storinator/Backblaze B2 solution gave them the best of both worlds, high performance local storage and readily accessible, affordable off-site cloud storage all while eliminating their old archive: the closet full of external hard drives.

The post Panna Cooking Creates the Perfect Storage Recipe with Backblaze and 45 Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Connect Your QNAP NAS to B2 Cloud Storage

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/qnap-nas-backup-to-cloud/

QNAP NAS and B2 Cloud Storage

Network-Attached-Storage (NAS) devices are great for local backups and archives of data. They have become even more capable, now often taking over functions that used to be reserved for servers.

QNAP produces a popular line of networking products, including NAS units that can work with Macintosh, Windows, Linux, and other OS’s. QNAP’s NAS products are used in office, home, and professional environments for storage and a variety of applications, including business, development, home automation, security, and entertainment.

Data stored on a QNAP NAS can be backed up to Backblaze B2 Cloud Storage using QNAP’s Hybrid Backup Sync application, which consolidates backup, restoration and synchronization functions into a single QTS application. With the latest releases of QTS and Hybrid Backup Sync (HBS), you can now sync the data on your QNAP NAS to and from Backblaze B2 Cloud Storage.

How to Set up QNAP’s Hybrid Backup Sync to Work With B2 Cloud Storage

To set up your QNAP with B2 sync support, you’ll need access to your B2 account. You’ll also need your B2 Account ID, application key and bucket name — all of which are available after you log into your Backblaze account. Finally, you’ll need the Hybrid Backup Sync application installed in QTS on your QNAP NAS. You’ll need QTS 4.3.3 or later and Hybrid Backup Sync v2.1.170615 or later.

  1. Open the QTS desktop in your web browser.

QNAP QTS Desktop

  1. If it’s not already installed, install the latest Hybrid Backup Sync from the App Center.

QNAP QTS AppCenter

  1. Click on Hybrid Backup Sync from the desktop.
  1. Click the Sync button to create a new connection to B2.

QNAP Hybrid Backup Sync

  1. Select “One-way Sync” and “Sync with the cloud.”

QNAP Hybrid Backup Sync -- Create Sync Job

  1. Select “Local to cloud sync.”

QNAP Hybrid Backup Sync -- Sync with the cloud

  1. Select an existing Account (job) or just select “Backblaze B2” to create a new one.

QNAP Hybrid Backup Sync -- Select Account

  1. Enter a display name for this job, and your Account ID and Application key for your Backblaze B2 account.

QNAP Hybrid Backup Sync -- Create Account

  1. Select the source folder on the NAS you’d like to sync, and the bucket name and folder name on B2 for the destination. If you’d like to sync immediately, select the “Sync Now” checkbox. Click “Advanced Settings” if you’d like to configure a backup schedule, select client-side encryption, compression, filters, file replacement policies, and other options. Click “Apply.” If you selected “Sync Now” your job will start.

QNAS Hybrid Backup Sync -- Create Sync Job

QNAP Hybrid Backup Sync -- Advanced Settings

  1. After you’ve finished configuring your job, you will see the “All Jobs” dialog with the status of all your jobs.

QNAP Hybrid Backup Sync -- All Jobs

What You Can Do With B2 and QNAP Hybrid Backup Sync?

The Hybrid Backup Sync app provides you with total control over what gets backed up to B2. You can synchronize in the cloud as little or as much as you want. Here are some practical examples of what you can do with Hybrid Backup Sync and B2 working together.

1 — Sync the Entire Contents of your QNAP to the Cloud

The QNAP NAS has excellent fault-tolerance — it can continue operating even when individual drive units fail — but nothing in life is foolproof. It pays to be prepared in the event of a catastrophe. If you follow our 3-2-1 Backup Strategy, you know how important it is to make sure that you have a copy of your files in the cloud.

2 — Sync Your Most Important Media Files

Using your QNAP to store movies, music and photos? You’ve invested untold amounts of time, money, and effort into collecting those media files, so make sure they’re safely and securely synced to the cloud with Hybrid Backup Sync and B2.

3 — Back Up Time Machine and Other Local Backups

Apple’s Time Machine software provides Mac users with reliable local backup, and many of our customers rely on it to provide that crucial first step in making sure their data is secure.

QNAP enables the NAS to act as a network-based Time Machine backup. Those Time Machine files can be synced to the cloud, so you can make sure to have Time Machine files to restore from in the event of a critical failure.

If you use Windows or Linux, you can configure the QNAP NAS as the destination for your Windows or Linux local data backup. That, in turn, can be synced to the cloud from the NAS.

Why B2?

B2 is the best value in cloud storage. The cost to store data in the B2 cloud is up to 75 percent less than the competition. You can see for yourself with our B2 Cost Calculator.

If you haven’t given B2 a try yet, now is the time. You can get started with B2 and your QNAP NAS right now, and make sure your NAS is synced securely and automatically to the cloud.

The post How to Connect Your QNAP NAS to B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/firmware-for-data-center-optimization/

Data Center with Backblaze Pod

How and Why Advanced Devices Go Through Evolution in the Field

By Jason Feist, Seagate Senior Director for Technology Strategy and Product Planning

One of the most powerful features in today’s hard drives is the ability to update the firmware of deployed hard drives. Firmware changes can be straight-forward, such as changing a power setting, or as delicate as adjusting the height a read/write head flies above a spinning platter. By combining customer inputs, drive statistics, and a myriad of engineering talents, Seagate can use firmware updates to optimize the customer experience for the workload at hand.

In today’s guest post we are pleased to have Jason Feist, Senior Director for Technology Strategy and Product Planning at Seagate, describe how the Seagate ecosystem works.

 —Andy Klein

Storage Devices for the Data Center: Both Design-to-Application and In-Field Design Updates Are Important

As data center managers bring new IT architectures online, and as various installed components mature, technology device makers release firmware updates to enhance device operation, add features, and improve interoperability. The same is true for hard drives.

Hardware design takes years; firmware design can unlock the ability for that same hardware platform to persist in the field at the best cost structure if updates are deployed for continuous improvement over the product life cycle. In close and constant consultation with data center customers, hard drive engineers release firmware updates to ensure products provide the best experience in the field. Having the latest firmware is critical to ensure optimal drive operation and data center reliability. Likewise, as applications evolve, performance and features can mature over time to more effectively solve customer needs.

Data Center Managers Must Understand the Evolution of Data Center Needs, Architectures, and Solutions

Scientists and engineers at advanced technology companies like Seagate develop solutions based on understanding customers’ applications up front. But the job doesn’t end there; we also continue to assess and tweak devices in the field to fit very specific and evolving customer needs.

Likewise, the data center manager or IT architect must understand many technical considerations when installing new hardware. Integrating storage devices into a data center is never a matter of choosing any random hard drive or SSD that features a certain capacity or a certain IOPS specification. The data center manager must know the ins and outs of each storage device, and how it impacts myriad factors like performance, power, heat, and device interoperability.

But after rolling out new hardware, the job is not done. In fact, the job’s never done. Data center devices continue to evolve, even after integration. The hardware built for data centers is designed to be updated on a regular basis, based on a continuous cycle of feedback from ever-evolving applications and implementations.

As continued in-field quality assurance activities and updates in the field maintain the device’s appropriate interaction with the data center’s evolving needs, a device will continue to improve in terms of interoperability and performance until the architecture and the device together reach maturity. Managing these evolving needs and technology updates is a critical factor in achieving the expected best TCO (total cost of ownership) for the data center.

It’s important for data center managers to work closely with device makers to ensure integration is planned and executed correctly, monitoring and feedback is continuous, and updates are developed and deployed. In recent years as cloud and hyperscale data centers have evolved, Seagate has worked hard to develop a powerful support ecosystem for these partners.

The Team of Engineers Behind Storage Integration

The key to creating a successful program is to establish an application engineering and technical customer management team that’s engaged with the customer. Our engineering team meets with large data center customers on an ongoing basis. We work together from the pre-development phase to the time we qualify a new storage device. We collaborate to support in-field system monitoring, and sustaining activities like analyzing the logs on the hard drives, consulting about solutions within the data center, and ensuring the correct firmware updates are in place on the storage devices.

The science and engineering specialties on the team are extensive and varied. Depending on the topics at each meeting, analysis and discussion requires a breadth of engineering expertise. Dozens of engineering degrees and years of experience are on hand, including engineers expert in firmware, servo control systems, mechanical, tribology, electrical, reliability, and manufacturing; The titles of experts contributing include computer engineers, aerospace engineers, test engineers, statisticians, data analysts, and material scientists. Within each discipline are unique specializations such as ASIC engineers, channel technology engineers, and mechanical resonance engineers who understand shock and vibration factors.

The skills each engineer brings are necessary to understand the data customers are collecting and analyzing, how to deploy new products and technologies, and when to develop changes that’ll improve the data center’s architecture. It takes this team of engineering talent to comprehend the intricate interplay of devices, code, and processes needed to keep the architecture humming in harmony from the customer’s point of view.

How a Device Maker Works With a Data Center to Integrate and Sustain Performance and Reliability

After we establish our working team with a customer and when we’re introducing a new product for integration into a customer data center, we meet weekly to go over qualification status. We do a full design review on new features, consider the differences from the previous to the new design and how to address particular asks they may have for our next product design.

Traditionally, storage component designers would simply comply with whatever the T10 or T13 interface specification says. These days, many of the cloud data centers are asking for their own special sauce in some form, whether they’re trying to get a certain number of IOPS per terabyte, or trying to match their latency down to a certain number — for example, “I want to achieve four or five 9’s at this latency number; I want to be able to stream data at this rate; I want to have this power consumption.”

Recently, working with a customer to solve a specific need they had, we deployed Flex dynamic recording technology, which enables a single hard drive to use both SMR (Shingled Magnetic Recording) and CMR (Conventional Magnetic Recording, for example Perpendicular Recording) methods on the same drive media. This required a very high-level team integration with the customer. We spent great effort going back and forth on what an interface design should be, what a command protocol should be, and what the behavior should be in certain conditions.

Sometimes a drive design is unique to one customer, and sometimes it’s good for all our customers. There’s always a tradeoff; if you want really high performance, you’re probably going to pay for it with power. But when a customer asks for a certain special sauce, that drives us to figure out how to achieve that in balance with other needs. Then — similarly to when an automaker like Chevy or Honda builds race car engines and learns how to achieve new efficiency and performance levels — we can apply those new features to a broader customer set, and ultimately other customers will benefit too.

What Happens When Adjustments Are Needed in the Field?

Once a new product is integrated, we then continue to work closely from a sustaining standpoint. Our engineers interface directly with the customer’s team in the field, often in weekly meetings and sometimes even more frequently. We provide a full rundown on the device’s overall operation, dealing with maintenance and sustaining issues. For any error that comes up in the logs, we bring in an expert specific to that error to pore over the details.

In any given week we’ll have a couple of engineers in the customer’s data center monitoring new features and as needed debugging drive issues or issues with the customer’s system. Any time something seems amiss, we’ve got plans in place that let us do log analysis remotely and in the field.

Let’s take the example of a drive not performing as the customer intended. There are a number of reliability features in our drives that may interact with drive response — perhaps adding latency on the order of tens of milliseconds. We work with the customer on how we can manage those features more effectively. We help analyze the drive’s logs to tell them what’s going on and weigh the options. Is the latency a result of an important operation they can’t do without, and the drive won’t survive if we don’t allow that operation? Or is it something that we can defer or remove, prioritizing the workload goal?

How Storage Architecture and Design Has Changed for Cloud and Hyperscale

The way we work with cloud and data center partners has evolved over the years. Back when IT managers would outfit business data centers with turn-key systems, we were very familiar with the design requirements for traditional OEM systems with transaction-based workloads, RAID rebuild, and things of that nature. Generally, we were simply testing workloads that our customers ran against our drives.

As IT architects in the cloud space moved toward designing their data centers made-to-order, on open standards, they had a different notion of reliability and doing replication or erasure coding to create a more reliable environment. Understanding these workloads, gathering these traces and getting this information back from these customers was important so we could optimize drive performance under new and different design strategies: not just for performance, but for power consumption also. The number of drives populating large data centers is mind boggling, and when you realize what the power consumption is, you realize how important it is to optimize the drive for that particular variable.

Turning Information Into Improvements

We have always executed a highly standardized set of protocols on drives in our lab qualification environment, using racks that are well understood. In these scenarios, the behavior of the drive is well understood. By working directly with our cloud and data center partners we’re constantly learning from their unique environments.

For example, the customer’s architecture may have big fans in the back to help control temperature, and the fans operate with variable levels of cooling: as things warm up, the fans spin faster. At one point we may discover these fan operations are affecting the performance of the hard drive in the servo subsystem. Some of the drive logging our engineers do has been brilliant at solving issues like that. For example, we’d look at our position error signal, and we could actually tell how fast the fan was spinning based on the adjustments the drive was making to compensate for the acoustic noise generated by the fans.

Information like this is provided to our servo engineering team when they’re developing new products or firmware so they can make loop adjustments in our servo controllers to accommodate the range of frequencies we’re seeing from fans in the field. Rather than having the environment throw the drive’s heads off track, our team can provide compensation to keep the heads on track and let the drives perform reliably in environments like that. We can recreate the environmental conditions and measurements in our shop to validate we can control it as expected, and our future products inherit these benefits as we go forward.

In another example, we can monitor and work to improve data throughput while also maintaining reliability by understanding how the data center environment is affecting the read/write head’s ability to fly with stability at a certain height above the disk platter while reading bits. Understanding the ambient humidity and the temperature is essential to controlling the head’s fly height. We now have an active fly-height control system with the controller-firmware system and servo systems operating based on inputs from sensors within the drive. Traditionally a hard drive’s fly-height control was calibrated in the factory — a set-and-forget kind of thing. But with this field adjustable fly-height capability, the drive is continually monitoring environmental data. When the environment exceeds certain thresholds, the drive will recalculate what that fly height should be, so it’s optimally flying and getting the best air rates, ideally providing the best reliability in the field.

The Benefits of in-Field Analysis

These days a lot of information can be captured in logs and gathered from a drive to be brought back to our lab to inform design changes. You’re probably familiar with SMART logs that have been traditional in drives; this data provides a static snapshot in time of the status of a drive. In addition, field analysis reliability logs measure environmental factors the drive is experiencing like vibration, shock, and temperature. We can use this information to consider how the drive is responding and how firmware updates might deal with these factors more efficiently. For example, we might use that data to understand how a customer’s data center architecture might need to change a little bit to enable better performance, or reduce heat or power consumption, or lower vibrations.

What Does This Mean for the Data Center Manager?

There’s a wealth of information we can derive from the field, including field log data, customers’ direct feedback, and what our failure analysis teams have learned from returned drives. By actively participating in the process, our data center partners maximize the benefit of everything we’ve jointly learned about their environment so they can apply the latest firmware updates with confidence.

Updating firmware is an important part of fleet management that many data center operators struggle with. Some data centers may continue to run firmware even when an update is available, because they don’t have clear policies for managing firmware. Or they may avoid updates because they’re unsure if an update is right for their drive or their situation.

Would You Upgrade a Live Data Center?

Nobody wants their team to be responsible for allowing a server go down due to a firmware issue. How will the team know when new firmware is available, and whether it applies to specific components in the installed configuration? One method is for IT architects to set up a regular quarterly schedule to review possible firmware updates of all data center components. At the least, devising a review and upgrade schedule requires maintaining a regular inventory of all critical equipment, and setting up alerts or pull-push communications with each device maker so the team can review the latest release notes and schedule time to install updates as appropriate.

Firmware sent to the field for the purpose of updating in-service drives undergoes the same rigorous testing that the initial code goes through. In addition, the payload is verified to be compatible with the code and drive model that’s being updated. That means you can’t accidentally download firmware that isn’t for the intended drive. There are internal consistency checks to reject invalid code. Also, to help minimize performance impacts, firmware downloads support segmented download; the firmware can be downloaded in small pieces (the user can choose the size) so they can be interleaved with normal system work and have a minimal impact on performance. The host can decide when to activate the new code once the download is completed.

In closing, working closely with data center managers and architects to glean information from the field is important because it helps bring Seagate’s engineering team closer to our customers. This is the most powerful piece of the equation. Seagate needs to know what our customers are experiencing because it may be new and different for us, too. We intend these tools and processes to help both data center architecture and hard drive science continue to evolve.

The post An Inside Look at Data Center Storage Integration: A Complex, Iterative, and Sustained Process appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Advanced Cloud Backup Solutions

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/advanced-cloud-backup-solutions/

Advanced Cloud Backup Solutions Diagram

Simple and easy backup is great for most people. For those who find configuring and managing backup challenging, it can mean the difference between having a computer regularly backed up and doing nothing at all.

For others, simple and easy doesn’t cut it. They have needs that go beyond install and forget it, or have platforms and devices that aren’t handled by a program like Backblaze Computer Backup. Backing up Windows Servers, Linux boxes, NAS devices, or even VMs all require specialized software. Beyond a device or platform need, there are advanced users that want to fine tune or automate how backup fits into their other IT tasks.

For these people, there are a number of storage solutions with applications and integrations that will fulfill their needs. Backblaze B2 Cloud Storage, for example, is general purpose object cloud storage that can be used for a wide range of uses, including data backup and archiving. You can think of B2 as an infinitely large and secure storage drive in the cloud that’s ready for anything you want to do with it. Backblaze provides the storage, along with a robust API, web interface, and a CLI for the do-it-yourself crowd. In addition, there’s a long list of partner integrations to address specific market segments, platforms, use cases, and feature requirements. You just bring the data, and we get you backed up securely and affordably.

Advanced Backup Needs & Solutions

There’s a wide range of features and use cases that could be called advanced backup. Some of these cases go beyond what we define as backup and include archiving. We distinguish backup and archiving in the following way. Backup is making one or more copies of currently active data in case you want to retrieve a previous version of the data or something happens to your primary copy. Archiving is a record of your data at a moment in time. Backblaze Computer Backup is a data backup solution. Backblaze B2, being general purpose, can be used for either backup or archiving, depending on the user’s needs. Recovery is another possible use of object cloud storage that we’ll cover in future posts.

A Dozen Advanced Cloud Backup Use Cases

Below you’ll find a dozen capabilities, use cases, and features that could fall in the category of advanced cloud backup and archiving. All twelve can be addressed with a combination of Backblaze B2 Cloud Storage and a Backblaze solution, or in concert with one of our many integration partners.

1 — File and Directory/Folder Selection
The vast majority of users want all their data backed up. The Backblaze Computer Backup client backs everything up by default and lets users exclude what they want. Some advanced users prefer to manually select what specific drives, folders, and directories are included for backup and/or be able to set global rules for inclusion and exclusion.
2 — Deduplication, Snapshots
Some IT professionals are looking to deduplicate data across multiple machines before backups are made. Others want granular control of recovery to a specific point in time through the use of “snapshots.”
3 — Archiving and Custom Retention Policies, Lifecycle, Versioning
This feature set includes the ability to specify how long a given snapshot of data should be kept (e.g. how long do I want the version of my computer from Jan 7, 2009 to be saved?) Permutations of this feature set include how many versions of a backup file should be retained, and when they should be discarded, if desired.
4 — Platform and Interface
Most computer users are running on Windows or Macintosh, but others are on Linux, Solaris, FreeBSD, or other OSs. Clients and applications can be available in either command-line (CLI) or graphical user interface (GUI) versions, or sometimes both.
Macintosh, Windows, Linux
5 — Servers and NAS
A common need of advanced users and IT managers is the ability to back up servers and Network-Attached Storage (NAS) devices. In some cases, the servers are virtual machines (VMs) that have special backup needs.
Server
6 — Media
Video and photos have their own backup requirements and workflows. These include the ability to attach metadata about where and when the media was created, what equipment was used, what the subject and content are, and other information. Special search technologies are used to retrieve the stored media. Media files also tend to be large, which brings with it extra storage requirements. People who work with media have specific workflows they use, and ways of working with collaborators and production resources that put demands on the data storage. Transcoding and other processes may be available that can change or repurpose the stored data.
Up, Up to the Cloud
7 — Local and Cloud Backups, Multiple Destinations
Some advanced backup needs include backing up to multiple destinations, which can include a local drive or network device, a server, or various remote destinations using various connection and file transfer protocols (FTP, SMB, SSH, WebDav, etc.).
8 — Advanced Scheduling & Custom Actions, Automation
Advanced backup often includes the ability to schedule actions to be repeated at specific times and dates, or to be triggered by actions or conditions, such as if a file is changed or a program has been run and had a specific outcome.
9 — Advanced Security & Encryption
Security is a concern of every person storing data, but some users have requirements for how and where data is encrypted, where the keys are stored, who has access, and recovery options. In addition, some organizations, agencies, and governments have specific requirements what type of encryption is used, who has access to the data, and other requirements.
10 — Mass Data Ingress
Some users have large amounts of data that would take a long time to transfer even over the fastest network connection. These users are interested in other ways of seeding cloud storage, including shipping a physical drive directly to the cloud purveyor.
Backblaze B2 Fireball
11 — Virtual/Hybrid Storage
A local storage device seamlessly and transparently extends storage needs to the cloud.
12 — WordPress Backup
WordPress is the most popular CMS (Content Management System) for websites, with almost 30% of all websites in the world using WordPress — over 350 million. Backup systems for WordPress typically integrate directly with WordPress through a free or paid plugin installed with WordPress.
Read more about it:

A Wide Range of Backup Options and Capabilities

By combining general purpose object cloud storage with custom or off-the-shelf integrations, a wide range of solutions can be created for advanced backup and archiving. We’re already addressed a number of the use cases above on this blog (see links in sections above), and we’ll be addressing more in future posts. We hope you’ll come back and check those out.

Let Us Know What You’re Doing for Advanced Backup and Archiving

Please tell us about any advanced uses you have, or uses you would like to see addressed in future posts. Just drop us a note in the comments.

The post Advanced Cloud Backup Solutions appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Hiring a Director of Sales

Post Syndicated from Yev original https://www.backblaze.com/blog/hiring-a-director-of-sales/

Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.

The History of Backblaze from our CEO
In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.

Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.

Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:

  • A brand millions recognize for openness, ease-of-use, and affordability.
  • A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
  • Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
  • A growing, profitable and cash-flow positive company.
  • And last, but most definitely not least: a great sales team.

You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:

  • We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
  • We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
  • We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
  • We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
  • We want someone to manage our Customer Success program.

So that’s a bit about us. What are we looking for in you?

Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.

Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.

Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.

Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.

Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.

Additional Responsibilities needed for this role:

  • Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
  • Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
  • Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
  • Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
  • Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
  • Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
  • This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.

Requirements:

  • 7 – 10+ years of successful sales leadership experience as measured by sales performance against goals.
    Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
  • Background in selling SaaS technologies with a strong track record of success.
  • Strong presentation and communication skills.
  • Must be able to travel occasionally nationwide.
  • BA/BS degree required

Think you want to join us on this adventure?
Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:

  1. How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
  2. What are the goals you would set for yourself in the 3 month and 1-year timeframes?

Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.

Backblaze is an Equal Opportunity Employer.

The post Hiring a Director of Sales appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Getting Rid of Your Mac? Here’s How to Securely Erase a Hard Drive or SSD

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-to-wipe-a-mac-hard-drive/

erasing a hard drive and a solid state drive

What do I do with a Mac that still has personal data on it? Do I take out the disk drive and smash it? Do I sweep it with a really strong magnet? Is there a difference in how I handle a hard drive (HDD) versus a solid-state drive (SSD)? Well, taking a sledgehammer or projectile weapon to your old machine is certainly one way to make the data irretrievable, and it can be enormously cathartic as long as you follow appropriate safety and disposal protocols. But there are far less destructive ways to make sure your data is gone for good. Let me introduce you to secure erasing.

Which Type of Drive Do You Have?

Before we start, you need to know whether you have a HDD or a SSD. To find out, or at least to make sure, you click on the Apple menu and select “About this Mac.” Once there, select the “Storage” tab to see which type of drive is in your system.

The first example, below, shows a SATA Disk (HDD) in the system.

SATA HDD

In the next case, we see we have a Solid State SATA Drive (SSD), plus a Mac SuperDrive.

Mac storage dialog showing SSD

The third screen shot shows an SSD, as well. In this case it’s called “Flash Storage.”

Flash Storage

Make Sure You Have a Backup

Before you get started, you’ll want to make sure that any important data on your hard drive has moved somewhere else. OS X’s built-in Time Machine backup software is a good start, especially when paired with Backblaze. You can learn more about using Time Machine in our Mac Backup Guide.

With a local backup copy in hand and secure cloud storage, you know your data is always safe no matter what happens.

Once you’ve verified your data is backed up, roll up your sleeves and get to work. The key is OS X Recovery — a special part of the Mac operating system since OS X 10.7 “Lion.”

How to Wipe a Mac Hard Disk Drive (HDD)

NOTE: If you’re interested in wiping an SSD, see below.

    1. Make sure your Mac is turned off.
    2. Press the power button.
    3. Immediately hold down the command and R keys.
    4. Wait until the Apple logo appears.
    5. Select “Disk Utility” from the OS X Utilities list. Click Continue.
    6. Select the disk you’d like to erase by clicking on it in the sidebar.
    7. Click the Erase button.
    8. Click the Security Options button.
    9. The Security Options window includes a slider that enables you to determine how thoroughly you want to erase your hard drive.

There are four notches to that Security Options slider. “Fastest” is quick but insecure — data could potentially be rebuilt using a file recovery app. Moving that slider to the right introduces progressively more secure erasing. Disk Utility’s most secure level erases the information used to access the files on your disk, then writes zeroes across the disk surface seven times to help remove any trace of what was there. This setting conforms to the DoD 5220.22-M specification.

  1. Once you’ve selected the level of secure erasing you’re comfortable with, click the OK button.
  2. Click the Erase button to begin. Bear in mind that the more secure method you select, the longer it will take. The most secure methods can add hours to the process.

Once it’s done, the Mac’s hard drive will be clean as a whistle and ready for its next adventure: a fresh installation of OS X, being donated to a relative or a local charity, or just sent to an e-waste facility. Of course you can still drill a hole in your disk or smash it with a sledgehammer if it makes you happy, but now you know how to wipe the data from your old computer with much less ruckus.

The above instructions apply to older Macintoshes with HDDs. What do you do if you have an SSD?

Securely Erasing SSDs, and Why Not To

Most new Macs ship with solid state drives (SSDs). Only the iMac and Mac mini ship with regular hard drives anymore, and even those are available in pure SSD variants if you want.

If your Mac comes equipped with an SSD, Apple’s Disk Utility software won’t actually let you zero the hard drive.

Wait, what?

In a tech note posted to Apple’s own online knowledgebase, Apple explains that you don’t need to securely erase your Mac’s SSD:

With an SSD drive, Secure Erase and Erasing Free Space are not available in Disk Utility. These options are not needed for an SSD drive because a standard erase makes it difficult to recover data from an SSD.

In fact, some folks will tell you not to zero out the data on an SSD, since it can cause wear and tear on the memory cells that, over time, can affect its reliability. I don’t think that’s nearly as big an issue as it used to be — SSD reliability and longevity has improved.

If “Standard Erase” doesn’t quite make you feel comfortable that your data can’t be recovered, there are a couple of options.

FileVault Keeps Your Data Safe

One way to make sure that your SSD’s data remains secure is to use FileVault. FileVault is whole-disk encryption for the Mac. With FileVault engaged, you need a password to access the information on your hard drive. Without it, that data is encrypted.

There’s one potential downside of FileVault — if you lose your password or the encryption key, you’re screwed: You’re not getting your data back any time soon. Based on my experience working at a Mac repair shop, losing a FileVault key happens more frequently than it should.

When you first set up a new Mac, you’re given the option of turning FileVault on. If you don’t do it then, you can turn on FileVault at any time by clicking on your Mac’s System Preferences, clicking on Security & Privacy, and clicking on the FileVault tab. Be warned, however, that the initial encryption process can take hours, as will decryption if you ever need to turn FileVault off.

With FileVault turned on, you can restart your Mac into its Recovery System (by restarting the Mac while holding down the command and R keys) and erase the hard drive using Disk Utility, once you’ve unlocked it (by selecting the disk, clicking the File menu, and clicking Unlock). That deletes the FileVault key, which means any data on the drive is useless.

FileVault doesn’t impact the performance of most modern Macs, though I’d suggest only using it if your Mac has an SSD, not a conventional hard disk drive.

Securely Erasing Free Space on Your SSD

If you don’t want to take Apple’s word for it, if you’re not using FileVault, or if you just want to, there is a way to securely erase free space on your SSD. It’s a little more involved but it works.

Before we get into the nitty-gritty, let me state for the record that this really isn’t necessary to do, which is why Apple’s made it so hard to do. But if you’re set on it, you’ll need to use Apple’s Terminal app. Terminal provides you with command line interface access to the OS X operating system. Terminal lives in the Utilities folder, but you can access Terminal from the Mac’s Recovery System, as well. Once your Mac has booted into the Recovery partition, click the Utilities menu and select Terminal to launch it.

From a Terminal command line, type:

diskutil secureErase freespace VALUE /Volumes/DRIVE

That tells your Mac to securely erase the free space on your SSD. You’ll need to change VALUE to a number between 0 and 4. 0 is a single-pass run of zeroes; 1 is a single-pass run of random numbers; 2 is a 7-pass erase; 3 is a 35-pass erase; and 4 is a 3-pass erase. DRIVE should be changed to the name of your hard drive. To run a 7-pass erase of your SSD drive in “JohnB-Macbook”, you would enter the following:

diskutil secureErase freespace 2 /Volumes/JohnB-Macbook

And remember, if you used a space in the name of your Mac’s hard drive, you need to insert a leading backslash before the space. For example, to run a 35-pass erase on a hard drive called “Macintosh HD” you enter the following:

diskutil secureErase freespace 3 /Volumes/Macintosh\ HD

Something to remember is that the more extensive the erase procedure, the longer it will take.

When Erasing is Not Enough — How to Destroy a Drive

If you absolutely, positively need to be sure that all the data on a drive is irretrievable, see this Scientific American article (with contributions by Gleb Budman, Backblaze CEO), How to Destroy a Hard Drive — Permanently.

The post Getting Rid of Your Mac? Here’s How to Securely Erase a Hard Drive or SSD appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Welcome Jack — Data Center Tech

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-jack-data-center-tech/

As we shoot way past 500 petabytes of data stored, we need a lot of helping hands in the data center to keep those hard drives spinning! We’ve been hiring quite a lot, and our latest addition is Jack. Lets learn a bit more about him, shall we?

What is your Backblaze Title?
Data Center Tech

Where are you originally from?
Walnut Creek, CA until 7th grade when the family moved to Durango, Colorado.

What attracted you to Backblaze?
I had heard about how cool the Backblaze community is and have always been fascinated by technology.

What do you expect to learn while being at Backblaze?
I expect to learn a lot about how our data centers run and all of the hardware behind it.

Where else have you worked?
Garrhs HVAC as an HVAC Installer and then Durango Electrical as a Low Volt Technician.

Where did you go to school?
Durango High School and then Montana State University.

What’s your dream job?
I would love to be a driver for the Audi Sport. Race cars are so much fun!

Favorite place you’ve traveled?
Iceland has definitely been my favorite so far.

Favorite hobby?
Video games.

Of what achievement are you most proud?
Getting my Eagle Scout badge was a tough, but rewarding experience that I will always cherish.

Star Trek or Star Wars?
Star Wars.

Coke or Pepsi?
Coke…I know, it’s bad.

Favorite food?
Thai food.

Why do you like certain things?
I tend to warm up to things the more time I spend around them, although I never really know until it happens.

Anything else you’d like to tell us?
I’m a friendly car guy who will always be in love with my European cars and I really enjoy the Backblaze community!

We’re happy you joined us Out West! Welcome aboard Jack!

The post Welcome Jack — Data Center Tech appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Replacing macOS Server with Synology NAS

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/replacing-macos-server-with-synology-nas/

Synology NAS boxes backed up to the cloud

Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.

Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:

macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.

The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.

Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).

As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.

Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.

What Comes After macOS Server?

Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.

Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.

The Times They Are A-Changin’

For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.

In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.

IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.

Synology NAS as a macOS Server Alternative

One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.

Synology NAS box

Synology NAS

Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.

Synology DiskStation Manager Package Center screenshot

Synology DiskStation Manager Package Center

According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.

Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.

Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).

Are You Sticking with macOS Server or Moving to Another Platform?

If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.

The post Replacing macOS Server with Synology NAS appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.