All posts by Pat Patterson

Querying a Decade of Drive Stats Data

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/querying-a-decade-of-drive-stats-data/

Last week, we published Backblaze Drive Stats for Q3 2022, sharing the metrics we’ve gathered on our fleet of over 230,000 hard drives. In this blog post, I’ll explain how we’re now using the Trino open source SQL query engine in ensuring the integrity of Drive Stats data, and how we plan to use Trino in future to generate the Drive Stats result set for publication.

Converting Zipped CSV Files into Parquet

In his blog post Storing and Querying Analytical Data in Backblaze B2, my colleague Greg Hamer explained how we started using Trino to analyze Drive Stats data earlier this year. We quickly discovered that formatting the data set as Apache Parquet minimized the amount of data that Trino needed to download from Backblaze B2 Cloud Storage to process queries, resulting in a dramatic improvement in query performance over the original CSV-formatted data.

As Greg mentioned in the earlier post, Drive Stats data is published quarterly to Backblaze B2 as a single .zip file containing a CSV file for each day of the quarter. Each CSV file contains a record for each drive that was operational on that day (see this list of the fields in each record).

When Greg and I started working with the Parquet-formatted Drive Stats data, we took a simple, but somewhat inefficient, approach to converting the data from zipped CSV to Parquet:

  • Download the existing zip files to local storage.
  • Unzip them.
  • Run a Python script to read the CSV files and write Parquet-formatted data back to local storage.
  • Upload the Parquet files to Backblaze B2.

We were keen to automate this process, so we reworked the script to use the Python ZipFile module to read the zipped CSV data directly from its Backblaze B2 Bucket and write Parquet back to another bucket. We’ve shared the script in this GitHub gist.

After running the script, the drivestats table now contains data up until the end of Q3 2022:

trino:ds> SELECT DISTINCT year, month, day 
FROM drivestats ORDER BY year DESC, month DESC, day DESC LIMIT 1;
year | month | day 
------+-------+-----
 2022 |     9 |  30 
(1 row)

In the last article, we were working with data running until the end of Q1 2022. On March 31, 2022, the Drive Stats dataset comprised 296 million records, and there were 211,732 drives in operation. Let’s see what the current situation is:

trino:ds> SELECT COUNT(*) FROM drivestats;
   _col0 
-----------
 346006813 
(1 row) 

trino:ds> SELECT COUNT(*) FROM drivestats 
    WHERE year = 2022 AND month = 9 AND day = 30;
   _col0 
--------
 230897 
(1 row)

So, since the end of March, we’ve added 50 million rows to the dataset, and Backblaze is now spinning nearly 231,000 drives—over 19,000 more than at the end of March 2022. Put another way, we’ve added more than 100 drives per day to the Backblaze Cloud Storage Platform in the past six months. Finally, how many exabytes of raw data storage does Backblaze now manage?

trino:ds> SELECT ROUND(SUM(CAST(capacity_bytes AS bigint))/1e+18, 2)
FROM drivestats WHERE year = 2022 AND month = 9 AND day = 30;
 _col0 
-------
  2.62 
(1 row)

Will we cross the three exabyte mark this year? Stay tuned to find out.

Ensuring the Integrity of Drive Stats Data

As Andy Klein, the Drive Stats supremo, collates each quarter’s data, he looks for instances of healthy drives being removed and then returned to service. This can happen for a variety of operational reasons, but it shows up in the data as the drive having failed, then later revived. This subset of data shows the phenomenon:

trino:ds> SELECT year, month, day, failure FROM drivestats WHERE 
serial_number = 'ZHZ4VLNV' AND year >= 2021 ORDER BY year, month, 
day;
 year | month | day | failure 
------+-------+-----+---------
...
 2021 |    12 |  26 |       0 
 2021 |    12 |  27 |       0 
 2021 |    12 |  28 |       0 
 2021 |    12 |  29 |       1 
 2022 |     1 |   3 |       0 
 2022 |     1 |   4 |       0 
 2022 |     1 |   5 |       0 
...

This drive appears to have failed on Dec 29, 2021, but was returned to service on Jan 3, 2022.

Since these spurious “failures” would skew the reliability statistics, Andy searches for and removes them from each quarter’s data. However, even Andy can’t see into the future, so, when a drive is taken offline at the end of one quarter and then returned to service in the next quarter, as in the above case, there is a bit of a manual process to find anomalies and clean up past data.

With the entire dataset in a single location, we can now write a SQL query to find drives that were removed, then returned to service, no matter when it occurred. Let’s build that query up in stages.

We start by finding the serial numbers and failure dates for each drive failure:

trino:ds> SELECT serial_number, DATE(FORMAT('%04d-%02d-%02d', year, 
month, day)) AS date 
FROM drivestats 
WHERE failure = 1;
  serial_number  |    date    
-----------------+------------
 ZHZ3KMX4        | 2021-04-01 
 ZA12RBBM        | 2021-04-01 
 S300Z52X        | 2017-03-01 
 Z3051FWK        | 2017-03-01 
 Z304JQAE        | 2017-03-02 
...
(17092 rows)

Now we find the most recent record for each drive:

trino:ds> SELECT serial_number, MAX(DATE(FORMAT('%04d-%02d-%02d', 
year, month, day))) AS date
    FROM drivestats 
    GROUP BY serial_number;
  serial_number   |    date    
------------------+------------
 ZHZ65F2W         | 2022-09-30 
 ZLW0GQ82         | 2022-09-30 
 ZLW0GQ86         | 2022-09-30 
 Z8A0A057F97G     | 2022-09-30 
 ZHZ62XAR         | 2022-09-30 
...
(329908 rows)

We then join the two result sets to find spurious failures; that is, failures where the drive was later returned to service. Note the join condition—we select records whose serial numbers match and where the most recent record is later than the failure:

trino:ds> SELECT f.serial_number, f.failure_date
FROM (
    SELECT serial_number, DATE(FORMAT('%04d-%02d-%02d', year, month, 
day)) AS failure_date
    FROM drivestats 
    WHERE failure = 1
) AS f
INNER JOIN (
    SELECT serial_number, MAX(DATE(FORMAT('%04d-%02d-%02d', year, 
month, day))) AS last_date
    FROM drivestats 
    GROUP BY serial_number
) AS l
ON f.serial_number = l.serial_number AND l.last_date > f.failure_date;
  serial_number  | failure_date 
-----------------+--------------
 2003261ED34D    | 2022-06-09 
 W300STQ5        | 2022-06-11 
 ZHZ61JMQ        | 2022-06-17 
 ZHZ4VL2P        | 2022-06-21 
 WD-WX31A2464044 | 2015-06-23 
(864 rows)

As you can see, the current schema makes date comparisons a little awkward, pointing the way to optimizing the schema by adding a DATE-typed column to the existing year, month, and day. This kind of denormalization is common in analytical data.

Calculating the Quarterly Failure Rates

In calculating failure rates per drive model for each quarter, Andy loads the quarter’s data into MySQL and defines a set of views. We additionally define the current_quarter view to restrict the failure rate calculation to data in July, August, and September 2022:

CREATE VIEW current_quarter AS 
    SELECT * FROM drivestats
    WHERE year = 2022 AND month in (7, 8, 9);

CREATE VIEW drive_days AS 
    SELECT model, COUNT(*) AS drive_days 
    FROM current_quarter
    GROUP BY model;

CREATE VIEW failures AS
    SELECT model, COUNT(*) AS failures
    FROM current_quarter
    WHERE failure = 1
    GROUP BY model
UNION
    SELECT DISTINCT(model), 0 AS failures
    FROM current_quarter
    WHERE model NOT IN
    (
        SELECT model
        FROM current_quarter
        WHERE failure = 1
        GROUP BY model
    );

CREATE VIEW failure_rates AS
    SELECT drive_days.model AS model,
           drive_days.drive_days AS drive_days,
           failures.failures AS failures, 
           100.0 * (1.0 * failures) / (drive_days / 365.0) AS 
annual_failure_rate
    FROM drive_days, failures
    WHERE drive_days.model = failures.model;

Running the above statements in Trino, then querying the failure_rates view, yields a superset of the data that we published in the Q3 2022 Drive Stats report. The difference is that this result set includes drives that Andy excludes from the Drive Stats report: SSD boot drives, drives that were used for testing purposes, and drive models which did not have at least 60 drives in service:

trino:ds> SELECT * FROM failure_rates ORDER BY model;
        model         | drive_days | failures | annual_failure_rate 
----------------------+------------+----------+---------------------
 CT250MX500SSD1       |      32171 |        2 |                2.27 
 DELLBOSS VD          |      33706 |        0 |                0.00 
 HGST HDS5C4040ALE630 |       2389 |        0 |                0.00 
 HGST HDS724040ALE640 |         92 |        0 |                0.00 
 HGST HMS5C4040ALE640 |     341509 |        3 |                0.32 
 ...
 WDC WD60EFRX         |        276 |        0 |                0.00 
 WDC WDS250G2B0A      |       3867 |        0 |                0.00 
 WDC WUH721414ALE6L4  |     765990 |        5 |                0.24 
 WDC WUH721816ALE6L0  |     242954 |        0 |                0.00 
 WDC WUH721816ALE6L4  |     308630 |        6 |                0.71 
(74 rows)

Query 20221102_010612_00022_qscbi, FINISHED, 1 node
Splits: 139 total, 139 done (100.00%)
8.63 [82.4M rows, 5.29MB] [9.54M rows/s, 628KB/s]

Optimizing the Drive Stats Production Process

Now that we have shown that we can derive the required statistics by querying the Parquet-formatted data with Trino, we can streamline the Drive Stats process. Starting with the Q4 2022 report, rather than wrangling each quarter’s data with a mixture of tools on his laptop, Andy will use Trino to both clean up the raw data and produce the Drive Stats result set for publication.

Accessing the Drive Stats Parquet Dataset

When Greg and I started experimenting with Trino, our starting point was Brian Olsen’s Trino Getting Started GitHub repository, in particular, the Hive connector over MinIO file storage tutorial. Since MinIO and Backblaze B2 both have S3-compatible APIs, it was easy to adapt the tutorial’s configuration to target the Drive Stats data in Backblaze B2, and Brian was kind enough to accept my contribution of a new tutorial showing how to use the Hive connector over Backblaze B2 Cloud Storage. This tutorial will get you started using Trino with data stored in Backblaze B2 Buckets, and includes a section on accessing the Drive Stats dataset.

You might be interested to know that Backblaze is sponsoring this year’s Trino Summit, taking place virtually and in person in San Francisco, on November 10. Registration is free; if you do attend, come say hi to Greg and me at the Backblaze booth and see Trino in action, querying data stored in Backblaze B2.

The post Querying a Decade of Drive Stats Data appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Lights, Camera, Custom Action (Part Two): Inside Integrating Frame.io + Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/lights-camera-custom-action-part-two-inside-integrating-frame-io-backblaze-b2/

Part 2 in a series covering the Frame.io/Backblaze B2 integration, covering the implementation. See Part 1 here, which covers the UI.

In Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2, we described a custom action for the Frame.io cloud-based media asset management (MAM) platform. The custom action allows users to export assets and projects from Frame.io to Backblaze B2 Cloud Storage and import them back from Backblaze B2 to Frame.io.

The custom action is implemented as a Node.js web service using the Express framework, and its complete source code is open-sourced under the MIT license in the backblaze-frameio GitHub repository. In this blog entry we’ll focus on how we secured the solution, how we made it deployable anywhere (including to options with free bandwidth), and how you can customize it to your needs.

What is a Custom Action?

Custom Actions are a way for you to build integrations directly into Frame.io as programmable UI components. This enables event-based workflows that can be triggered by users within the app, but controlled by an external system. You create custom actions in the Frame.io Developer Site, specifying a name (shown as a menu item in the Frame.io UI), URL, and Frame.io team, among other properties. The user sees the custom action in the contextual/right-click dropdown menu available on each asset:

When the user selects the custom action menu item, Frame.io sends an HTTP POST request to the custom action URL, containing the asset’s id. For example:

{
  "action_id": "2444cccc-7777-4a11-8ddd-05aa45bb956b",
  "interaction_id": "aafa3qq2-c1f6-4111-92b2-4aa64277c33f",
  "resource": {
    "type": "asset",
    "id": "9q2e5555-3a22-44dd-888a-abbb72c3333b"
  },
  "type": "my.action"
}

The custom action can optionally respond with a JSON description of a form to gather more information from the user. For example, our custom action needs to know whether the user wishes to export or import data, so its response is:

{
  "title": "Import or Export?",
  "description": "Import from Backblaze B2, or export to Backblaze B2?",
  "fields": [
    {
      "type": "select",
      "label": "Import or Export",
      "name": "copytype",
      "options": [
        {
          "name": "Export to Backblaze B2",
          "value": "export"
        },
        {
          "name": "Import from Backblaze B2",
          "value": "import"
        }
      ]
    }
  ]
}

When the user submits the form, Frame.io sends another HTTP POST request to the custom action URL, containing the data entered by the user. The custom action can respond with a form as many times as necessary to gather the data it needs, at which point it responds with a suitable message. For example, when it has all the information it needs to export data, our custom action indicates that an asynchronous job has been initiated:

{
  "title": "Job submitted!",
  "description": "Export job submitted for asset."
}

Securing the Custom Action

When you create a custom action in the Frame.io Developer Tools, a signing key is generated for it. The custom action code uses this key to verify that the request originates from Frame.io.

When Frame.io sends a POST request, it includes the following HTTP headers:

X-Frameio-Request-Timestamp The time the custom action was triggered, in Epoch Epoch timetime (seconds since midnight UTC, Jan 1, 1970).
X-Frameio-Signature The request signature.

The timestamp can be used to prevent replay attacks; Frame.io recommends that custom actions verify that this time is within five minutes of local time. The signature is an HMAC SHA-256 hash secured with the custom action’s signing key—a secret shared exclusively between Frame.io and the custom action. If the custom action is able to correctly verify the HMAC, then we know that the request came from Frame.io (message authentication) and it has not been changed in transit (message integrity).

The process for verifying the signature is:

    • Combine the signature version (currently “v0”), timestamp, and request body, separated by colons, into a string to be signed.
    • Compute the HMAC SHA256 signature using the signing key.
    • If the computed signature and signature header are not identical, then reject the request.

The custom action’s verify TimestampAndSignature() function implements the above logic, throwing an error if the timestamp is missing, outside the accepted range, or the signature is invalid. In all cases, 403 Forbidden is returned to the caller.

Custom Action Deployment Options

The root directory of the backblaze-frameio GitHub repository contains three directories, comprising two different deployment options and a directory containing common code:

  • node-docker—generic: Node.js deployment
  • node-risingcloud: Rising Cloud deployment
  • backblaze-frameio-common: common code

The node-docker directory contains a generic Node.js implementation suitable for deployment on any Internet-addressable machine–for example, an Optimized Cloud Compute VM on Vultr. The app comprises an Express web service that handles requests from Frame.io, providing form responses to gather information from the user, and a worker task that the web service executes as a separate process to actually copy files between Frame.io and Backblaze B2.

You might be wondering why the web service doesn’t just do the work itself, rather than spinning up a separate process to do so. Well, media projects can contain dozens or even hundreds of files, containing a terabyte or more of data. If the web service were to perform the import or export, it would tie up resources and ultimately be unable to respond to Frame.io. Spinning up a dedicated worker process frees the web service to respond to new requests while the work is being done.

The downside of this approach is that you have to deploy the custom action on a machine capable of handling the peak expected load. The node-risingcloud implementation works identically to the generic Node.js app, but takes advantage of Rising Cloud’s serverless platform to scale elastically. A web service handles the form responses, then starts a task to perform the work. The difference here is that the task isn’t a process on the same machine, but a separate job running in Rising Cloud’s infrastructure. Jobs can be queued and new task instances can be started dynamically in response to rising workloads.

Note that since both Vultr and Rising Cloud are Backblaze Compute Partners, apps deployed on those platforms enjoy zero-cost downloads from Backblaze B2.

Customizing the Custom Action

We published the source code for the custom action to GitHub under the permissive MIT license. You are free to “use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software” as long as you include the copyright notice and MIT permission notice when you do so.

At present, the user must supply the name of a file when importing an asset from Backblaze B2, but it would be straightforward to add code to browse the bucket and allow the user to navigate the file tree. Similarly, it would be straightforward to extend the custom action to allow the user to import a whole tree of files based on a prefix such as raw_footage/2022-09-07. Feel free to adapt the custom action to your needs; we welcome pull requests for fixes and new features!

The post Lights, Camera, Custom Action (Part Two): Inside Integrating Frame.io + Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/lights-camera-custom-action-integrating-frame-io-with-backblaze-b2/

At Backblaze, we love hearing from our customers about their unique and varied storage needs. Our media and entertainment customers have some of the most interesting use cases and often tell us about their workflow needs moving assets at every stage of the process, from camera to post-production and everywhere in between.

The desire to have more flexibility controlling data movement in their media management systems is a consistent theme. In the interest of helping customers with not just storing their data, but using their data, today we are publishing a new open-source custom integration we have created for Frame.io. Read on to learn more about how to use Frame.io to streamline your media workflows.

What is Frame.io?

Frame.io, an Adobe company, has built a cloud-based media asset management (MAM) platform allowing creative professionals to collaborate at every step of the video production process. For example, videographers can upload footage from the set after each take; editors can work with proxy files transcoded by Frame.io to speed the editing process; and production staff can share sound reports, camera logs, and files like Color Decision Lists.

The Backblaze B2 Custom Action for Frame.io

Creative professionals who use Frame.io know that it can be a powerful tool for content collaboration. Many of those customers also leverage Backblaze B2 for long-term archive, and often already have large asset inventories in Backblaze B2 as well.

What our Backblaze B2 Custom Action for Frame.io does is quite simple: it allows you to quickly move data between Backblaze B2 and Frame.io. Media professionals can use the action to export selected assets or whole projects from Frame.io to B2 Cloud Storage, and then later import exported assets and projects from B2 Cloud Storage back to Frame.io.

How to Use the Backblaze B2 Custom Action for Frame.io

Let’s take a quick look at how to use the custom action:

As you can see, after enabling the Custom Action, a new option appears in the asset context dropdown. Once you select the action, you are presented with a dialog to select Import or Export of data:

After selecting Export, you can choose whether you want just the single selected asset, or the entire project sent to Backblaze B2.

Once you make a selection, that’s it! The custom action handles the movement for you behind the scenes. The export is a point-in-time snapshot of the data from Frame.io—which remains as it was—to Backblaze B2.

The Custom Action creates a new exports folder in your B2 bucket, and then uploads the asset(s) to the folder. If you opt to upload the entire Project, it will be structured the same way it is organized in Frame.io.

How to Get Started With Backblaze B2 and Frame.io

To get started using the Custom Action described above, you will need:

  • A Frame.io account.
  • Access to a compute resource to run the custom action code.
  • A Backblaze B2 account.

If you don’t have a Backblaze B2 account yet, you can sign up here and get 10GB free, or contact us here to run a proof of concept with more than 10GB.

What’s Next?

We’ve written previously about similar open-sourced custom integrations for other tools, and by releasing this one we are continuing in that same spirit. If you are interested in learning more about this integration, you can jump straight to the source code on GitHub.

Watch this space for a follow-up post diving into more of the technical details. We’ll discuss how we secured the solution, made it deployable anywhere (including to options with free bandwidth), and how you can customize it to your needs.

We would love to hear your feedback on this integration, and also any other integrations you would like to see from Backblaze. Feel free to reach out to us in the comments below or through our social channels. We’re particularly active on Twitter and Reddit—let’s chat!

The post Lights, Camera, Custom Action: Integrating Frame.io with Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Streaming Media with Backblaze B2: A Data Storage Guide

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/roll-camera-streaming-media-from-backblaze-b2/

A decorative image showing a cloud with the Backblaze logo, a media icon, and a user icon.

You can store petabytes of audio and video assets in Backblaze B2 Cloud Storage, and lots of our customers do. While many customers archive their digital assets for long-term safekeeping, a growing number of customers use Backblaze B2 to deliver media assets to their end consumers, often embedded in web pages.

Embedding audio and video files in web pages for playback in the browser is nothing new, but there are a lot of ingredients in the mix, and it can be tricky to get right. Streaming media from Backblaze B2 simplifies sharing stored data. Whether you’re delivering content for web applications or supporting business workflows, Backblaze B2 offers a seamless way to store, protect, and share data.

After reading this blog post, you’ll be ready to deliver media assets from Backblaze B2 to website users reliably and affordably. I’ll cover:

  • A little bit of history on how streaming media came to be.
  • A primer on the various strands of technology and how they work.
  • A how-to guide for streaming media from your Backblaze B2 account.

The evolution of internet media streaming

Back in the early days of the web, audio and video content was a rarity. Most people connected to the internet via a dial-up link, and just didn’t have the bandwidth to stream audio, let alone video, content to their computer. Consequently, the early web standards specified how browsers should show images in web pages via the <img> tag, but made no mention of audio/visual resources. Digital storage media like floppy disks and magnetic disks were often used for storing and sharing multimedia content offline.

As bandwidth increased to the point where it was possible for more of us to stream large media files, Adobe’s Flash Player became the de facto standard for playing audio and video in the web browser. Flash allowed websites to embed and stream media files directly, revolutionizing the way content was consumed online. When YouTube launched, for example, in early 2005, it required the Flash Player plug-in to be installed in the browser to view the videos.

However, reliance on Flash presented limitations, including security vulnerabilities and compatibility issues across devices. As a result, web developers began exploring native solutions for streaming media. This resulted in a significant milestone in digital data delivery, as browsers and devices started incorporating built-in capabilities for playing media.

Meanwhile, the evolution of data storage media, including external hard drives, solid-state drives (SSDs), and cloud storage solutions, allowed users to store and access large media libraries more efficiently.

The HTML5 video element: Transforming online media playback

At around the same time, a consortium of the major browser vendors started work on a new version of HTML, the markup language that had been a part of the web since its inception. A major goal of HTML5 was to support multimedia content natively, and so, in its initial release in 2008, the specification introduced new <audio> and <video> tags to embed audiovisual content directly in web pages, without requiring additional plugins like Flash.

This development was a game-changer in digital storage media and content delivery, as it eliminated the need for third-party software and made streaming video and audio more accessible across devices. The shift also facilitated better cross-platform compatibility, a critical feature for supporting consumer devices such as smartphones, tablets, and smart TVs.

While web pages are written in HTML, they are delivered from the web server to the browser via the HTTP protocol. Web servers don’t just deliver web pages, of course—images, scripts, audio, and video files are also delivered via HTTP.

HTML5’s multimedia capabilities also introduced support for various codecs, enabling better compression and playback quality for formats like MP4. This innovation marked a significant step forward in how websites handled and delivered rich media experiences.

How streaming technology works

Understanding the key components of streaming technology will help you set up seamless digital data delivery on your site. Here, we’ll cover:

  • Streaming vs. progressive download.
  • HTTP 1.1 byte range serving.
  • Media file formats.
  • MIME types.

Streaming vs. progressive download

In common usage, the term, “streaming,” in the context of web media, can refer to any situation where the user can request content (for example, press a play button) and consume that content almost immediately, as opposed to downloading a media file, where the user has to wait to receive the entire file before they can watch or listen. This is particularly useful for media files stored on data storage media such as solid-state drives or cloud-based storage media.

Technically, however, the term, “streaming,” refers to a continuous delivery method, and uses transport protocols such as RTSP rather than HTTP. This form of streaming requires specialized software to handle data traffic in real time

Progressive download blends aspects of downloading and streaming. When the user presses play on a video on a web page, the browser starts to download the video file. However, the browser may begin playback before the download is complete. So, the user experience of progressive download is much the same as streaming, and I’ll use the term, “streaming” in its colloquial sense in this blog post. Progressive downloads rely on efficient data storage technology and protocols like HTTP to deliver content to consumer devices, such as laptops or smartphones.

Both streaming and progressive download are widely used in cloud environments today, and enable websites to serve media content directly from servers—whether local or in the cloud.

HTTP 1.1 byte range serving

HTTP enables progressive download through a feature known as range serving. Introduced to HTTP in version 1.1 back in 1997, byte range serving allows an HTTP client, such as your browser, to request a specific range of bytes from a resource, such as a video file, rather than the entire resource all at once.

Imagine you’re watching a video online and realize you’ve already seen the first half. You can click the video’s slider control, picking up the action at the appropriate point. Without byte range serving, your browser would be downloading the whole video, and you might have to wait several minutes for it to reach the halfway point and start playing. With byte range serving, the browser can specify a range of bytes in each request, so it’s easy for the browser to request data from the middle of the video file, skipping any amount of content almost instantly.

Byte range serving significantly enhances user experience and optimizes network bandwidth. It’s especially beneficial when serving large media files stored in cloud storage.

Backblaze B2 supports byte range serving in downloads via both the Backblaze B2 Native and S3 Compatible APIs. (Check out this post for an explainer of the differences between the two.)

Here’s an example range request for the first 10 bytes of a file in a Backblaze B2 bucket, using the cURL command line tool. 

You can see the Range header in the request, specifying bytes zero to nine, and the Content-Range header indicating that the response indeed contains bytes zero to nine of a total of 555,214,865 bytes. Note also the HTTP status code: 206, signifying a successful retrieval of partial content, rather than the usual 200.

% curl -I https://metadaddy-public.s3.us-west-004.backblazeb2.com/
example.mp4 -H 'Range: bytes=0-9'

HTTP/1.1 206
Accept-Ranges: bytes
Last-Modified: Tue, 12 Jul 2022 20:06:09 GMT
ETag: "4e104e1bd9a2111002a74c9c798515e6-106"
Content-Range: bytes 0-9/555214865
x-amz-request-id: 1e90f359de28f27a
x-amz-id-2: aMYY1L2apOcUzTzUNY0ZmyjRRZBhjrWJz
x-amz-version-id: 4_zf1f51fb913357c4f74ed0c1b_f202e87c8ea50bf77_
d20220712_m200609_c004_v0402006_t0054_u01657656369727
Content-Type: video/mp4
Content-Length: 10
Date: Tue, 12 Jul 2022 20:08:21 GMT

I recommend that you use S3-style URLs for media content, as shown in the above example, rather than Backblaze B2-style URLs of the form: https://f004.backblazeb2.com/file/metadaddy-public/example.mp4.

The B2 Native API responds to a range request that specifies the entire content, e.g., Range: 0-, with HTTP status 200, rather than 206. Safari interprets that response as indicating that Backblaze B2 does not support range requests, and thus will not start playing content until the entire file is downloaded. 

The S3 Compatible API returns HTTP status 206 for all range requests, regardless of whether they specify the entire content, so Safari will allow you to play the video as soon as the page loads.

Media file formats for storage and streaming

The third ingredient in streaming media successfully is the file format. There are several container formats for audio and video data, with familiar file name extensions such as .mov, .mp4, and .avi. Within these containers, media data can be encoded in many different ways by software components known as codecs, an abbreviation of coder/decoder.

Codecs play a critical role in digital storage media. They compress and decompress the data to optimize storage capacity while preserving playback quality. Choosing the right codec ensures efficient long-term storage and seamless delivery of high-quality media.

We could write a whole series of blog articles on containers and codecs, but the important point is that the media’s metadata—information regarding how to play the media, such as its length, bit rate, dimensions, and frames per second—must be located at the beginning of the video file, so that this information is immediately available as download starts. This optimization is known as “Fast Start” and is supported by software such as ffmpeg and Premiere Pro. Without this, there might be playback delays, negatively impacting the user experience.

Understanding MIME types for reliable media playback

The final piece of the puzzle is the media file’s MIME type, which identifies the file format for the browser or media player. You can see a MIME type in the Content-Type header in the above example request: video/mp4. You must specify the MIME type when you upload a file to Backblaze B2. You can set it explicitly, or use the special value b2/x-auto to tell Backblaze B2 to set the MIME type according to the file name’s extension, if one is present. It is important to set the MIME type correctly for reliable playback.

For those managing high-capacity storage, getting MIME types right helps streamline delivery and maintain compatibility with newer software and streaming protocols.

Putting it all together

So, we’ve covered the ingredients for streaming media from Backblaze B2 directly to a web page:

  • The HTML5 <audio> and <video> elements.
  • HTTP 1.1 byte range serving.
  • Encoding media for Fast Start.
  • Storing media files in Backblaze B2 with the correct MIME type.

Here’s an HTML5 page with a minimal example of an embedded video file:

<!DOCTYPE html>
<html>
<body>
<h1>Video</h1>
<video controls src="my-video.mp4" width="640px"></video>
</body>
</html>

The controls attribute tells the browser to show the default set of controls for playback. Setting the width of the video element makes it a more manageable size than the default, which is the video’s dimensions. This short video shows the video element in action:

Managing download costs with efficient data storage solutions

When serving media files from your account, you need to consider download charges as part of your overall data management strategy. Backblaze offers a few ways to manage these charges. To start, the first 1GB of data downloaded from your Backblaze B2 account per day is free. After that, we charge $0.01/GB—notably less than AWS at $0.05+/GB, Azure at $0.04+, and Google Cloud Platform at $0.12.

We also cover the download fees between Backblaze B2 and many CDN partners like Cloudflare, Fastly, and Bunny.net, so you can serve content closer to your end users via their edge networks. You’ll want to make sure you understand if there are limits on your media downloads from those vendors by checking the terms of service for your CDN account. Some service levels do restrict downloads of media content.

Start streaming: Your media storage journey begins

Now you know everything you need to know to get started encoding, uploading, and serving audio/visual content from Backblaze B2 Cloud Storage. Backblaze B2 is a great way to experiment with multimedia—the first 10GB of storage is free, and Backblaze pricing includes free egress per month. 

Sign up free, no credit card required, and start building your long-term backup and streaming infrastructure with Backblaze B2.

The post Streaming Media with Backblaze B2: A Data Storage Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/optimize-your-media-production-workflow-with-iconik-lucidlink-and-backblaze-b2/

In late April, thousands of professionals from all corners of the media, entertainment, and technology ecosystem assembled in Las Vegas for the National Association of Broadcasters trade show, better known as the NAB Show. We were delighted to sponsor NAB after its two year hiatus due to COVID-19. Our staff came in blazing hot and ready to hit the tradeshow floor.

One of the stars of the 2022 event was Backblaze partner LucidLink, named a Cloud Computing and Storage category winner in the NAB Show Product of the Year Awards. In this blog post, I’ll explain how to combine LucidLink’s Filespaces product with Backblaze B2 Cloud Storage and media asset management from iconik, another Backblaze partner, to optimize your media production workflow. But first, some context…

How iconik, LucidLink, and Backblaze B2 Fit in a Media Storage Architecture

The media and entertainment industry has always been a natural fit for Backblaze. Some of our first Backblaze Computer Backup customers were creative professionals looking to protect their work, and the launch of Backblaze B2 opened up new options for archiving, backing up, and distributing media assets.

As the media and entertainment industry moved to 4K Ultra HD for digital video recording over the past few years, file sizes ballooned. An hour of high quality 4K video shot at 60 frames per second can require up to one terabyte of storage. Backblaze B2 matches well with today’s media and entertainment storage demands, as customers such as Fortune Media, Complex Networks, and Alton Brown of “Good Eats” fame have discovered.

Alongside Backblaze B2, an ecosystem of tools has emerged to help professionals manage their media assets, including iconik and LucidLink. iconik’s cloud-native media management and collaboration solution gathers and organizes media securely from a wide range of locations, including Backblaze B2. iconik can scan and index content from a Backblaze B2 bucket, creating an asset for each file. An iconik asset can combine a lower resolution proxy with a link to the original full-resolution file in Backblaze B2. For a large part of the process, the production team can work quickly and easily with these proxy files, previewing and selecting clips and editing them into a sequence.

Complementing iconik and B2 Cloud Storage, LucidLink provides a high-performance, cloud-native, network-attached storage (NAS) solution that allows professionals to collaborate on files stored in the cloud almost as if the files were on their local machine. With LucidLink, a production team can work with multi-terabyte 4K resolution video files, making final edits and rendering the finished product at full resolution.

It’s important to understand that the video editing process is non-destructive. The original video files are immutable—they are never altered during the production process. As the production team “edits” a sequence, they are actually creating a series of transformations that are applied to the original videos as the final product is rendered.

You can think of B2 Cloud Storage and LucidLink as tiers in a media storage architecture. Backblaze B2 excels at cost-effective, durable storage of full-resolution video assets through their entire lifetime from acquisition to archive, while LucidLink shines during the later stages of the production process, from when the team transitions to working with the original full-resolution files to the final rendering of the sequence for release.

iconik brings B2 Cloud Storage and LucidLink together; not only can an iconik asset include a proxy and links to copies of the original video in both B2 Cloud Storage and LucidLink, iconik Storage Gateway can copy the original file from Backblaze B2 to LucidLink when full-resolution work commences, and later delete the LucidLink copy at the end of the production process, leaving the original archived in Backblaze B2. All that’s missing is a little orchestration.

The Backblaze B2 Storage Plugin for iconik

The Backblaze B2 Storage Plugin for iconik allows creative professionals to copy files from B2 Cloud Storage to LucidLink, and later delete them from LucidLink, in a couple of mouse clicks. The plugin adds a pair of custom actions to iconik: “Add to LucidLink” and “Remove from LucidLink,” applicable to one or many assets or collections, accessible from the Search page and the Asset/Collection page. You can see them on the lower right of this screenshot:

The user experience could hardly be simpler, but there is a lot going on under the covers.

There are several components involved:

  • The plugin, deployed as a serverless function. The initial version of the plugin is written in Python for deployment on Google Cloud Functions, but it could easily be adapted for other serverless cloud platforms.
  • A LucidLink Filespace.
  • A machine with both the LucidLink client and iconik Storage Gateway installed. The iconik Storage Gateway accesses the LucidLink Filespace as if it were local file storage.
  • iconik, accessed both by the user via its web interface and by the plugin via the iconik API. iconik is configured with two iconik “storages”, one for Backblaze B2 and one for the iconik Storage Gateway instance.

When the user selects the “Add to LucidLink” custom action, iconik sends an HTTP request, containing the list of selected entities, to the plugin. The plugin calls the iconik API with a request to copy those entities from Backblaze B2 to the iconik Storage Gateway. The gateway writes the files to the LucidLink Filespace, exactly as if it were writing to the local disk, and the LucidLink client sends the files to LucidLink. Now the full-resolution files are available for the production team to access in the Filespace, while the originals remain in B2 Cloud Storage.

Later, when the user selects the “Remove from LucidLink” custom action, iconik sends another HTTP request containing the list of selected entities to the plugin. This time, the plugin has more work to do. Collections can contain other collections as well as assets, so the plugin must access each collection in turn, calling the iconik API for each file in the collection to request that it be deleted from the iconik Storage Gateway. The gateway simply deletes each file from the Filespace, and the LucidLink client relays those operations to LucidLink. Now the files are no longer stored in the Filespace, but the originals remain in B2 Cloud Storage, safely archived for future use.

This short video shows the plugin in action, and walks through the flow in a little more detail:

Deploying the Backblaze B2 Storage Plugin for iconik

The plugin is available open-source under the MIT license at https://github.com/backblaze-b2-samples/b2-iconik-plugin. Full deployment instructions are included in the plugin’s README file.

Don’t have a Backblaze B2 account? You can get started here, and the first 10GB are on us. We can also set up larger scale trials involving terabytes of storage—enter your details and we’ll get back to you right away.

Customize the Plugin to Your Requirements

You can use the plugin as is, or modify it to your requirements. For example, the plugin is written to be deployed on Google Cloud Functions, but you could adapt it to another serverless cloud platform. Please report any issues with the plugin via the issues tab in the GitHub repository, and feel free to submit contributions via pull requests.

The post Optimize Your Media Production Workflow With iconik, LucidLink, and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Looking Forward to Backblaze Cloud Replication: Everything You Need to Know

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/looking-forward-to-backblaze-cloud-replication-everything-you-need-to-know/

Backblaze Cloud Replication—currently in private beta—enables Backblaze customers to store files in multiple regions, or create multiple copies of files in one region, across the Backblaze Storage Cloud. This capability, as we explained in an earlier blog post, allows you to create geographically separate copies of data for compliance and continuity, keep data closer to its consumers, or maintain a live copy of production data for testing and staging. Today we’ll look at how you can get started with Cloud Replication, so you’ll be ready for its release, likely early next month.

Backblaze Cloud Replication: The Basics

Backblaze B2 Cloud Storage organizes data into files (equivalent to Amazon S3’s objects) in buckets. Very simply, Cloud Replication allows you to create rules that control replication of files from a source bucket to a destination bucket. The source and destination buckets can be in the same or different accounts, or in the same or different regions.

Here’s a simple example: Suppose I want to replicate files from my-production-bucket to my-staging-bucket in the same account, so I can run acceptance tests on an application with real-life data. Using either the Backblaze web interface or the B2 Native API, I would simply create a Cloud Replication rule specifying the source and destination buckets in my account. Let’s walk through a couple of examples in each interface.

Cloud Replication via the Web Interface

Log in to the account containing the source bucket for your replication rule. Note that the account must have a payment method configured to participate in replication. Cloud Replication will be accessible via a new item in the B2 Cloud Storage menu on the left of the web interface:

Clicking Cloud Replication opens a new page in the web interface:

Click Replicate Your Data to create a new replication rule:

Configuring Replication Within the Same Account

To implement the simple rule, “replicate files from my-production-bucket to my-staging-bucket in the same account,” all you need to do is select the source bucket, set the destination region the same as the source region, and select or create the destination bucket:

Configuring Replication to a Different Account

To replicate data via the web interface to a different account, you must be able to log in to the destination account. Click Authenticate an existing account to log in. Note that the destination account must be enabled for Backblaze B2 and, again, must have a payment method configured:

After authenticating, you must select a bucket in the destination account. The process is the same whether the destination account is in the same or a different region:

Note that, currently, you may configure a bucket as a source in a maximum of two replication rules. A bucket can be configured as a destination in any number of rules.

Once you’ve created the rule, it is accessible via the web interface. You can pause a running rule, run a paused rule, or delete the rule altogether:

Replicating Data

Once you have created the replication rule, you can manipulate files in the source bucket as you normally would. By default, existing files in the source bucket will be copied to the destination bucket. New files, and new versions of existing files, in the source bucket will be replicated regardless of whether they are created via the Backblaze S3 Compatible API, the B2 Native API, or the Backblaze web interface. Note that the replication engine runs on a distributed system, so the time to complete replication is based on the number of other replication jobs scheduled, the number of files to replicate, and the size of the files to replicate.

Checking Replication Status

Click on a source or destination file in the web interface to see its details page. The file’s replication status is at the bottom of the list of attributes:

There are four possible values of replication status:

  • pending: The file is in the process of being replicated. If there are two rules, at least one of the rules is processing. (Reminder: Currently, you may configure a bucket as a source in a maximum of two replication rules.) Check again later to see if it has left this status.
  • completed: This status represents a successful replication. If two rules are configured, both rules have completed successfully.
  • failed: A non-recoverable error has occurred, such as insufficient permissions to write the file into the destination bucket. The system will not try again to process this file. If two rules are configured, at least one has failed.
  • replica: This file was created by the replication process. Note that replica files cannot be used as the source for further replication.

Cloud Replication and Application Keys

There’s one more detail to examine in the web interface before we move on to the API. Creating a replication rule creates up to two Application Keys; one with read permissions for the source bucket, if the source bucket is not already associated with an Application Key, and one with write permissions for the destination bucket.

The keys are visible in the App Keys page of the web interface:

You don’t need to worry about these keys if you are using the web interface, but it is useful to see how the pieces fit together if you are planning to go on to use the B2 Native API to configure Cloud Replication.

This short video walks you through setting up Cloud Replication in the web interface:

Cloud Replication via the B2 Native API

Configuring cloud replication in the web interface is quick and easy for a single rule, but quickly becomes burdensome if you have to set up multiple replication rules. The B2 Native API allows you to programmatically create replication rules, enabling automation and providing access to two features not currently accessible via the web interface: setting a prefix to constrain the set of files to be replicated and excluding existing files from the replication rule.

Configuring Replication

To create a replication rule, you must include replicationConfiguration when you call b2_create_bucket or b2_update_bucket. The source bucket’s replicationConfiguration must contain asReplicationSource, and the destination bucket’s replicationConfiguration must contain asReplicationDestination. Note that both can be present where a given bucket is the source in one replication rule and the destination in another.

Let’s illustrate the process with a concrete example. Let’s say you want to replicate newly created files with the prefix master_data/, and new versions of those files, from a bucket in the U.S. West region to one in the EU Central region so that you have geographically separate copies of that data. You don’t want to replicate any files that already exist in the source bucket.

Assuming the buckets already exist, you would first create a pair of Application Keys: one in the source account, with read permissions for the source bucket, and another in the destination account, with write permissions for the destination bucket.

Next, call b2_update_bucket with the following message body to configure the source bucket:

{
    "accountId": "<source account id/>",
    "bucketId": "<source bucket id/>",
    "replicationConfiguration": {
        "asReplicationSource": {
            "replicationRules": [
                {
                    "destinationBucketId": "<destination bucket id>",
                    "fileNamePrefix": "master_data/",
                    "includeExistingFiles": false,
                    "isEnabled": true,
                    "priority": 1,
                    "replicationRuleName": "replicate-master-data"
                }
            ],
            "sourceApplicationKeyId": "<source application key id/>"
        }
    }
}

Finally, call b2_update_bucket with the following message body to configure the destination bucket:

{
  "accountId": "<destination account id>",
  "bucketId": "<destination bucket id>",
  "replicationConfiguration": {
    "asReplicationDestination": {
      "sourceToDestinationKeyMapping": {
        "<source application key id/>": "<destination application key id>"
      }
    },
    "asReplicationSource": null
  }
}

You can check your work in the web interface:

Note that the “file prefix” and “include existing buckets” configuration is not currently visible in the web interface.

Viewing Replication Rules

If you are planning to use the B2 Native API to set up replication rules, it’s a good idea to experiment with the web interface first and then call b2_list_buckets to examine the replicationConfiguration property.

Here’s an extract of the configuration of a bucket that is both a source and destination:

{
  "accountId": "e92db1923dce",
  "bucketId": "2e2982ddebf12932830d0c1e",
  ...
  "replicationConfiguration": {
    "isClientAuthorizedToRead": true,
    "value": {
      "asReplicationDestination": {
        "sourceToDestinationKeyMapping": {
          "000437047f876700000000005": "003e92db1923dce0000000004"
        }
      },
      "asReplicationSource": {
        "replicationRules": [
          {
            "destinationBucketId": "0463b7a0a467fff877f60710",
            "fileNamePrefix": "",
            "includeExistingFiles": true,
            "isEnabled": true,
            "priority": 1,
            "replicationRuleName": "replication-eu-to-us"
          }
        ],
        "sourceApplicationKeyId": "003e92db1923dce0000000003"
      }
    }
  },
  ...
}

Checking a File’s Replication Status

To see the replication status of a file, including whether the file is itself a replica, call b2_get_file_info and examine the replicationStatus field. For example, looking at the same file as in the web interface section above:

{
  ...
  "bucketId": "548377d0a467fff877f60710",
  ...
  "fileId": "4_z548377d0a467fff877f60710_f115587450d2c8336_d20220406_
m162741_c000_v0001066_t0046_u01649262461427",
  ...
  "fileName": "Logo Slide.png",
  ...
  "replicationStatus": "completed",
  ...
}

This short video runs through the various API calls:

How Much Will This Cost?

The majority of fees for Cloud Replication are identical to standard B2 Cloud Storage billing: You pay for the total data you store, replication (download) fees, and for any related transaction fees. For details regarding billing, click here.

The replication fee is only incurred between cross-regional accounts. For example, a source in the U.S. West and a destination in EU Central would incur replication fees, which are priced identically to our standard download fee. If the replication rule is created within a region—for example, both source and destination are located in our U.S. West region—there is no replication fee.

How to Start Replicating

Watch the Backblaze Blog for an announcement when we make Backblaze Cloud Replication generally available (GA), likely early next month. As mentioned above, you will need to set up a payment method on accounts included in replication rules. If you don’t yet have a Backblaze B2 account, or you need to set up a Backblaze B2 account in a different region from your existing account, sign up here and remember to select the region from the dropdown before hitting “Sign Up for Backblaze B2.”

The post Looking Forward to Backblaze Cloud Replication: Everything You Need to Know appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Go Serverless with Rising Cloud and Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/go-serverless-with-rising-cloud-and-backblaze-b2/

Go Serverless With Rising Cloud and Backblaze B2

In my last blog post, I explained how to use a Cloudflare Worker to send notifications on Backblaze B2 events. That post focused on how a Worker could proxy requests to Backblaze B2 Cloud Storage, sending a notification to a webhook at Pipedream that logged each request to a Google Spreadsheet.

Developers integrating applications and solutions with Backblaze B2 can use the same technique to solve a wide variety of use cases. As an example, in this blog post, I’ll explain how you can use that same Cloudflare Worker to trigger a serverless function at our partner Rising Cloud that automatically creates thumbnails as images are uploaded to a Backblaze B2 bucket, without incurring any egress fees for retrieving the full-size images.

What is Rising Cloud?

Rising Cloud hosts customer applications on a cloud platform that it describes as Intelligent-Workloads-as-a-Service. You package your application as a Linux executable or a Docker-style container, and Rising Cloud provisions instances as your application receives HTTP requests. If you’re familiar with AWS Lambda, Rising Cloud satisfies the same set of use cases while providing more intelligent auto-scaling, greater flexibility in application packaging, multi-cloud resiliency, and lower cost.

Rising Cloud’s platform uses artificial intelligence to predict when your application is expected to receive heavy traffic volumes and scales up server resources by provisioning new instances of your application in advance of when they are needed. Similarly, when your traffic is low, Rising Cloud spins down resources.

So far, so good, but, as we all know, artificial intelligence is not perfect. What happens when Rising Cloud’s algorithm predicts a rise in traffic and provisions new instances, but that traffic doesn’t arrive? Well, Rising Cloud picks up the tab—you only pay for the resources your application actually uses.

As is common with most cloud platforms, Rising Cloud applications must be stateless—that is, they cannot themselves maintain state from one request to the next. If your application needs to maintain state, you have to bring your own data store. Our use case, creating image thumbnails, is a perfect match for this model. Each thumbnail creation is a self-contained operation and has no effect on any other task.

Creating Image Thumbnails on Demand

As I explained in the previous post, the Cloudflare Worker will send a notification to a configured webhook URL for each operation that it proxies to Backblaze B2 via the Backblaze S3 Compatible API. That notification contains JSON-formatted metadata regarding the bucket, file, and operation. For example, on an image download, the notification looks like this:

{
    "contentLength": 3015523,
    "contentType": "image/png",
    "method": "GET",
    "signatureTimestamp": "20220224T193204Z",
    "status": 200,
    "url": "https://s3.us-west-001.backblazeb2.com/my-bucket/image001.png"
}

If the metadata indicates an image upload (i.e. the method is PUT, the content type starts with image, and so on), the Rising Cloud app will retrieve the full-size image from the Backblaze B2 bucket, create a thumbnail image, and write that image back to the same bucket, modifying the filename to distinguish it from the original.

Here’s the message flow between the user’s app, the Cloudflare Worker, Backblaze B2, and the Rising Cloud app:

  1. A user uploads an image in a Backblaze B2 client application.
  2. The client app creates a signed upload request, exactly as it would for Backblaze B2, but sends it to the Cloudflare Worker rather than directly to Backblaze B2.
  3. The Worker validates the client’s signature and creates its own signed request.
  4. The Worker sends the signed request to Backblaze B2.
  5. Backblaze B2 validates the signature and processes the upload.
  6. Backblaze B2 returns the response to the Worker.
  7. The Worker forwards the response to the client app.
  8. The Worker sends a notification to the Rising Cloud Web Service.
  9. The Web Service downloads the image from Backblaze B2.
  10. The Web Service creates a thumbnail for the image.
  11. The Web Service uploads the thumbnail to Backblaze B2.

These steps are illustrated in the diagram below.

I decided to write the application in JavaScript, since the Node.js runtime environment and its Express web application framework are well-suited to handling HTTP requests. Also, the open-source Sharp Node.js module performs this type of image processing task 4x-5x faster than either ImageMagick or GraphicsMagick. The source code is available on GitHub.

The entire JavaScript application is less than 150 lines of well-commented JavaScript and uses the AWS SDK’s S3 client library to interact with Backblaze B2 via the Backblaze S3 Compatible API. The core of the application is quite straightforward:

    // Get the image from B2 (returns a readable stream as the body)
    console.log(`Fetching image from ${inputUrl}`);
    const obj = await client.getObject({
      Bucket: bucket,
      Key: keyBase + (extension ? "." + extension : "")
    });

    // Create a Sharp transformer into which we can stream image data
    const transformer = sharp()
      .rotate()                // Auto-orient based on the EXIF Orientation tag
      .resize(RESIZE_OPTIONS); // Resize according to configured options

    // Pipe the image data into the transformer
    obj.Body.pipe(transformer);

    // We can read the transformer output into a buffer, since we know 
    // that thumbnails are small enough to fit in memory
    const thumbnail = await transformer.toBuffer();

    // Remove any extension from the incoming key and append '_tn.'
    const outputKey = path.parse(keyBase).name + TN_SUFFIX 
                        + (extension ? "." + extension : "");
    const outputUrl = B2_ENDPOINT + '/' + bucket + '/' 
                        + encodeURIComponent(outputKey);

    // Write the thumbnail buffer to the same B2 bucket as the original
    console.log(`Writing thumbnail to ${outputUrl}`);
    await client.putObject({
      Bucket: bucket,
      Key: outputKey,
      Body: thumbnail,
      ContentType: 'image/jpeg'
    });

    // We're done - reply with the thumbnail's URL
    response.json({
      thumbnail: outputUrl
    });

One thing you might notice in the above code is that neither the image nor the thumbnail is written to disk. The getObject() API provides a readable stream; the app passes that stream to the Sharp transformer, which reads the image data from B2 and creates the thumbnail in memory. This approach is much faster than downloading the image to a local file, running an image-processing tool such as ImageMagick to create the thumbnail on disk, then uploading the thumbnail to Backblaze B2.

Deploying a Rising Cloud Web Service

With my app written and tested running locally on my laptop, it was time to deploy it to Rising Cloud. There are two types of Rising Cloud applications: Web Services and Tasks. A Rising Cloud Web Service directly accepts HTTP requests and returns HTTP responses synchronously, with the condition that it must return an HTTP response within 44 seconds to avoid a timeout—an easy fit for my thumbnail creator app. If I was transcoding video, on the other hand, an operation that might take several minutes, or even hours, a Rising Cloud Task would be more suitable. A Rising Cloud Task is a queueable function, implemented as a Linux executable, which may not require millisecond-level response times.

Rising Cloud uses Docker-style containers to deploy, scale, and manage apps, so the next step was to package my app as a Docker image to deploy as a Rising Cloud Web Service by creating a Dockerfile.

With that done, I was able to configure my app with its Backblaze B2 Application Key and Key ID, endpoint, and the required dimensions for the thumbnail. As with many other cloud platforms, apps can be configured via environment variables. Using the AWS SDK’s variable names for the app’s Backblaze B2 credentials meant that I didn’t have to explicitly handle them in my code—the SDK automatically uses the variables if they are set in the environment.

Rising Cloud Environment
Click to enlarge.

Notice also that the RESIZE_OPTIONS value is formatted as JSON, allowing maximum flexibility in configuring the resize operation. As you can see, I set the withoutEnlargement parameter as well as the desired width, so that images already smaller than the width would not be enlarged.

Calling a Rising Cloud Web Service

By default, Rising Cloud requires that app clients supply an API key with each request as an HTTP header with the name X-RisingCloud-Auth:

Rising Cloud Security
Click to enlarge.

So, to test the Web Service, I used the curl command-line tool to send a POST request containing a JSON payload in the format emitted by the Cloudflare Worker and the API key:

curl -d @example-request.json \
	-H 'Content-Type: application/json' \
	-H 'X-RisingCloud-Auth: ' \
	https://b2-risingcloud-demo.risingcloud.app/thumbnail

As expected, the Web Service responded with the URL of the newly created thumbnail:

{
  "thumbnail":"https://s3.us-west-001.backblazeb2.com/my-bucket/image001_tn.jpg"
}

(JSON formatted for clarity)

The final piece of the puzzle was to create a Cloudflare Worker from the Backblaze B2 Proxy template, and add a line of code to include the Rising Cloud API key HTTP header in its notification. The Cloudflare Worker configuration includes its Backblaze B2 credentials, Backblaze B2 endpoint, Rising Cloud API key, and the Web Service endpoint (webhook):

Environment Variables
Click to enlarge.

This short video shows the application in action, and how Rising Cloud spins up new instances to handle an influx of traffic:

Process Your Own B2 Files in Rising Cloud

You can deploy an application on Rising Cloud to respond to any Backblaze B2 operation(s). You might want to upload a standard set of files whenever a bucket is created, or keep an audit log of Backblaze B2 operations performed on a particular set of buckets. And, of course, you’re not limited to triggering your Rising Cloud application from a Cloudflare worker—your app can respond to any HTTP request to its endpoint.

Submit your details here to set up a free trial of Rising Cloud. If you’re not already building on Backblaze B2, sign up to create an account today—the first 10 GB of storage is free!

The post Go Serverless with Rising Cloud and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Use a Cloudflare Worker to Send Notifications on Backblaze B2 Events

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/use-a-cloudflare-worker-to-send-notifications-on-backblaze-b2-events/

When building an application or solution on Backblaze B2 Cloud Storage, a common requirement is to be able to send a notification of an event (e.g., a user uploading a file) so that an application can take some action (e.g., processing the file). In this blog post, I’ll explain how you can use a Cloudflare Worker to send event notifications to a wide range of recipients, allowing great flexibility when building integrations with Backblaze B2.

Why Use a Proxy to Send Event Notifications?

Event notifications are useful whenever you need to ensure that a given event triggers a particular action. For example, last month, I explained how a video sharing site running on Vultr’s Infrastructure Cloud could store raw and transcoded videos in Backblaze B2. In that example, when a user uploaded a video to a Backblaze B2 bucket via the web application, the web app sent a notification to a Worker app instructing the Worker to read the raw video file from the bucket, transcode it, and upload the processed file back to Backblaze B2.

A drawback of this approach is that, if we were to create a mobile app to upload videos, we would have to copy the notification logic into the mobile app. As the system grows, so does the maintenance burden. Each new app needs code to send notifications and, worse, if we need to add a new field to the notification message, we have to update all of the apps. If, instead, we move the notification logic from the web application to a Cloudflare Worker, we can send notifications on Backblaze B2 events from a single location, regardless of the origin of the request. This pattern of wrapping an API with a component that presents the exact same API but adds its own functionality is known as a proxy.

Cloudflare Workers: A Brief Introduction

Cloudflare Workers provides a serverless execution environment that allows you to create applications that run on Cloudflare’s global edge network. A Cloudflare Worker application intercepts all HTTP requests destined for a given domain, and can return any valid HTTP response. Your Worker can create that HTTP response in any way you choose. Workers can consume a range of APIs, allowing them to directly interact with the Cloudflare cache, manipulate globally unique Durable Objects, perform cryptographic operations, and more.

Cloudflare Workers often, but not always, implement the proxy pattern, sending outgoing HTTP requests to servers on the public internet in the course of servicing incoming requests. If we implement a proxy that intercepts requests from clients to Backblaze B2, it could both forward those requests to Backblaze B2 and send notifications of those requests to one or more recipient applications.

This example focuses on proxying requests to the Backblaze S3 Compatible API, and can be used with any S3 client application that works with Backblaze B2 by simply changing the client’s endpoint configuration.

Implementing a similar proxy for the B2 Native API is much simpler, since B2 Native API requests are secured by a bearer token rather than a signature. A B2 Native API proxy would simply copy the incoming request, including the bearer token, changing only the target URL. Look out for a future blog post featuring a B2 Native API proxy.

Proxying Backblaze B2 Operations With a Cloudflare Worker

S3 clients send HTTP requests to the Backblaze S3 Compatible API over a TLS-secured connection. Each request includes the client’s Backblaze Application Key ID (access key ID in AWS parlance) and is signed with its Application Key (secret access key), allowing Backblaze B2 to authenticate the client and verify the integrity of the request. The signature algorithm, AWS Signature Version 4 (SigV4), includes the Host header in the signed data, ensuring that a request intended for one recipient cannot be redirected to another. Unfortunately, this is exactly what we want to happen in this use case!

Our proxy Worker must therefore validate the signature on the incoming request from the client, and then create a new signature that it can include in the outgoing request to the Backblaze B2 endpoint. Note that the Worker must be configured with the same Application Key and ID as the client to be able to validate and create signatures on the client’s behalf.

Here’s the message flow:

  1. A user performs an action in a Backblaze B2 client application, for example, uploading an image.
  2. The client app creates a signed request, exactly as it would for Backblaze B2, but sends it to the Cloudflare Worker rather than directly to Backblaze B2.
  3. The Worker validates the client’s signature, and creates its own signed request.
  4. The Worker sends the signed request to Backblaze B2.
  5. Backblaze B2 validates the signature, and processes the request.
  6. Backblaze B2 returns the response to the Worker.
  7. The Worker forwards the response to the client app.
  8. The Worker sends a notification to the webhook recipient.
  9. The recipient takes some action based on the notification.

These steps are illustrated in the diagram below.

The validation and signing process imposes minimal overhead, even for requests with large payloads, since the signed data includes a SHA-256 digest of the request payload, included with the request in the x-amz-content-sha256 HTTP header, rather than the payload itself. The Worker need not even read the incoming request payload into memory, instead passing it to the Cloudflare Fetch API to be streamed directly to the Backblaze B2 endpoint.

The Worker returns Backblaze B2’s response to the client unchanged, and creates a JSON-formatted webhook notification containing the following parameters:

  • contentLength: Size of the request body, if there was one, in bytes.
  • contentType: Describes the request body, if there was one. For example, image/jpeg.
  • method: HTTP method, for example, PUT.
  • signatureTimestamp: Request timestamp included in the signature.
  • status: HTTP status code returned from B2 Cloud Storage, for example 200 for a successful request or 404 for file not found.
  • url: The URL requested from B2 Cloud Storage, for example, https://s3.us-west-004.backblazeb2.com/my-bucket/hello.txt.

The Worker submits the notification to Cloudflare for asynchronous processing, so that the response to the client is not delayed. Once the interaction with the client is complete, Cloudflare POSTs the notification to the webhook recipient.

Prerequisites

If you’d like to follow the steps below to experiment with the proxy yourself, you will need to:

1. Creating a Cloudflare Worker Based on the Proxy Code

The Cloudflare Worker B2 Webhook GitHub repository contains full source code and configuration details. You can use the repository as a template for your own Worker using Cloudflare’s wrangler CLI. You can change the Worker name (my-proxy in the sample code below) as you see fit:

wrangler generate my-proxy
https://github.com/backblaze-b2-samples/cloudflare-b2-proxy
cd my-proxy

2. Configuring and Deploying the Cloudflare Worker

You must configure AWS_ACCESS_KEY_ID and AWS_S3_ENDPOINT in wrangler.toml before you can deploy the Worker. Configuring WEBHOOK_URL is optional—you can set it to empty quotes if you just want a vanity URL for Backblaze B2.

[vars]

AWS_ACCESS_KEY_ID = "<your b2 application key id>"
AWS_S3_ENDPOINT = "</your><your endpoint - e.g. s3.us-west-001.backblazeb2.com>"
AWS_SECRET_ACCESS_KEY = "Remove this line after you make AWS_SECRET_ACCESS_KEY a secret in the UI!"
WEBHOOK_URL = "<e.g. https://api.example.com/webhook/1 >"

Note the placeholder for AWS_SECRET_ACCESS_KEY in wrangler.toml. All variables used in the Worker must be set before the Worker can be published, but you should not save your Backblaze B2 application key to the file (see the note below). We work around these constraints by initializing AWS_SECRET_ACCESS_KEY with a placeholder value.

Use the CLI to publish the Worker project to the Cloudflare Workers environment:

wrangler publish

Now log in to the Cloudflare dashboard, navigate to your new Worker, and click the Settings tab, Variables, then Edit Variables. Remove the placeholder text, and paste your Backblaze B2 Application Key as the value for AWS_SECRET_ACCESS_KEY. Click the Encrypt button, then Save. The environment variables should look similar to this:

Finally, you must remove the placeholder line from wrangler.toml. If you do not do so, then the next time you publish the Worker, the placeholder value will overwrite your Application Key.

Why Not Just Set AWS_SECRET_ACCESS_KEY in wrangler.toml?

You should never, ever save secrets such as API keys and passwords in source code files. It’s too easy to forget to remove sensitive data from source code before sharing it either privately or, worse, on a public repository such as GitHub.

You can access the Worker via its default endpoint, which will have the form https://my-proxy.<your-workers-subdomain>.workers.dev, or create a DNS record in your own domain and configure a route associating the custom URL with the Worker.

If you try accessing the Worker URL via the browser, you’ll see an error message:

<Error>
<Code>AccessDenied</Code>
<Message>
Unauthenticated requests are not allowed for this api
</Message>
</Error>

This is expected—the Worker received the request, but the request did not contain a signature.

3. Configuring the Client Application

The only change required in your client application is the S3 endpoint configuration. Set it to your Cloudflare Worker’s endpoint rather than your Backblaze account’s S3 endpoint. As mentioned above, the client continues to use the same Application Key and ID as it did when directly accessing the Backblaze S3 Compatible API.

4. Implementing a Webhook Consumer

The webhook consumer must accept JSON-formatted messages via HTTP POSTs at a public endpoint accessible from the Cloudflare Workers environment. The webhook notification looks like this:

{
"contentLength": 30155,
"contentType": "image/png",
"method": "PUT",
"signatureTimestamp": "20220224T193204Z",
"status": 200,
"url": "https://s3.us-west-001.backblazeb2.com/my-bucket/image001.png"
}

You might implement the webhook consumer in your own application or, alternatively, use an integration platform such as IFTTT, Zapier, or Pipedream to trigger actions in downstream systems. I used Pipedream to create a workflow that logs each Backblaze B2 event as a new row in a Google Sheet. Watch it in action in this short video:

Put the Proxy to Work!

The Cloudflare Worker/Backblaze B2 Proxy can be used as-is in a wide variety of integrations—anywhere you need an event in Backblaze B2 to trigger an action elsewhere. At the same time, it can be readily adapted for different requirements. Here are a few ideas.

In this initial implementation, the client uses the same credentials to access the Worker as the Worker uses to access Backblaze B2. It would be straightforward to use different credentials for the upstream and downstream connections, ensuring that clients can’t bypass the Worker and access Backblaze B2 directly.

POSTing JSON data to a webhook endpoint is just one of many possibilities for sending notifications. You can integrate the worker with any system accessible from the Cloudflare Workers environment via HTTP. For example, you could use a stream-processing platform such as Apache Kafka to publish messages reliably to any number of consumers, or, similarly, send a message to an Amazon Simple Notification Service (SNS) topic for distribution to SNS subscribers.

As a final example, the proxy has full access to the request and response payloads. Rather than sending a notification to a separate system, the worker can operate directly on the data, for example, transparently compressing incoming uploads and decompressing downloads. The possibilities are endless.

How will you put the Cloudflare Worker Backblaze B2 Proxy to work? Sign up for a Backblaze B2 account and get started!

The post Use a Cloudflare Worker to Send Notifications on Backblaze B2 Events appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Media Transcoding With Backblaze B2 and Vultr Optimized Cloud Compute

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/media-transcoding-with-backblaze-b2-and-vultr-optimized-cloud-compute/

Since announcing the Backblaze + Vultr partnership last year, we’ve seen our mutual customers build a wide variety of applications combining Vultr’s Infrastructure Cloud with Backblaze B2 Cloud Storage, taking advantage of zero-cost data transfer between Vultr and Backblaze. This week, Vultr announced Optimized Cloud Compute instances, virtual machines pairing dedicated best-in-class AMD CPUs with just the right amount of RAM and NVMe SSDs.

To mark the occasion, I built a demonstration that both showcases this new capability and gives you an example application to adapt to your own use cases.

Imagine you’re creating the next big video sharing site—CatTube—a spin-off of Catblaze, your feline-friendly backup service. You’re planning all sorts of amazing features, but the core of the user experience is very familiar:

  • A user uploads a video from their mobile or desktop device.
  • The user’s video is available for viewing on a wide variety of devices, from anywhere in the world.

Let’s take a high-level look at how this might work…

Transcoding Explained: How Video Sharing Sites Make Videos Shareable

The user will upload their video to a web application from their browser or a mobile app. The web application must store the uploaded user videos in a highly scalable, highly available service—enter Backblaze B2 Cloud Storage. Our customers store, in the aggregate, petabytes of media data including video, audio, and still images.

But, those videos may be too large for efficient sharing and streaming. Today’s mobile devices can record video with stunning quality at 4K resolution, typically 3840 × 2160 pixels. While 4K video looks great, the issue is that even with compression, it’s a lot of data—about 1MB per second. Not all of your viewers will have that kind of bandwidth available, particularly if they’re on the move.

So, CatTube, in common with other popular video sharing sites, will need to convert raw uploaded video to one or more standard, lower-resolution formats, a process known as transcoding.

Transcoding is a very different workload from running a web application’s backend. Where an application server requires high I/O capability, but relatively little CPU power, transcoding is extremely CPU-intensive. You decide that you’ll need two sets of machines for CatTube—application servers and workers. The worker machines can be optimized for the transcoding task, taking advantage of the fastest available CPUs.

For these tasks, you need appropriate cloud compute instances. I’ll walk you through how I implemented CatTube as a very simple video sharing site with Backblaze B2 and Vultr’s Infrastructure Cloud using Vultr’s Cloud Compute instances for the application servers and their new Optimized Cloud Compute instances for the transcoding workers.

Building a Video Sharing Site With Backblaze B2 + Vultr

The video sharing example comprises a web application, written in Python using the Django web framework, and a worker application, also written in Python, but using the Flask framework.

Here’s how the pieces fit together:

  1. The user uploads a video from their browser to the web app.
  2. The web app uploads the raw video to a private bucket on Backblaze B2.
  3. The web app sends a message to the worker instructing it to transcode the video.
  4. The worker downloads the raw video to local storage and transcodes it, also creating a thumbnail image.
  5. The worker uploads the transcoded video and thumbnail to Backblaze B2.
  6. The worker sends a message to the web app with the addresses of the input and output files in Backblaze B2.
  7. Viewers around the world can enjoy the video.

These steps are illustrated in the diagram below.

Click to enlarge.

There’s a more detailed description in the Backblaze B2 Video Sharing Example GitHub repository, as well as all of the code for the web application and the worker. Feel free to fork the repository and use the code as a starting point for your own projects.

Here’s a short video of the system in action:

Some Caveats:

Note that this is very much a sample implementation. The web app and the worker communicate via HTTP—this works just fine for a demo, but doesn’t account for the worker being too busy to receive the message. Nor does it scale to multiple workers. In a production implementation, these issues would be addressed by the components communicating via an asynchronous messaging system such as Kafka. Similarly, this sample transcodes to a single target format: 720p. A real video sharing site would transcode the raw video to a range of formats and resolutions.

Want to Try It for Yourself?

Vultr’s new Cloud Compute Optimized instances are a perfect match for CPU-intensive tasks such as media transcoding. Zero-cost ingress and egress between Backblaze B2 and Vultr’s Infrastructure Cloud allow you to build high performance, scalable applications to satisfy a global audience. Sign up for Backblaze B2 and Vultr’s Infrastructure Cloud today, and get to work!

The post Media Transcoding With Backblaze B2 and Vultr Optimized Cloud Compute appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Building a Multiregion Origin Store With Backblaze B2 + Fastly Compute@Edge

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/building-a-multiregion-origin-store-with-backblaze-b2-fastly-computeedge/

Backblaze B2 Cloud Storage customers have long leveraged our partner Fastly’s Deliver@Edge CDN as an essential component of a modern, scalable web architecture. Complementing Deliver@Edge, Compute@Edge is a serverless computing environment built on the same caching platform to provide a general-purpose compute layer between the cloud and end users. Today, we’re excited to celebrate Fastly’s announcement of its Compute@Edge partner ecosystem.

Serverless computing is quickly gaining popularity among developers for its simplicity, agility, and functionality. In the serverless model, cloud providers allocate resources to applications on demand, managing the compute infrastructure on behalf of their customers. The term, “serverless,” is a little misleading: The servers are actually still there, but customers don’t have to get involved in their provisioning, configuration, maintenance, or scaling.

Fastly’s Compute@Edge represents the next generation of serverless computing—purpose-built for better performance, reduced latency, and enhanced visibility and security. Using Fastly’s tools, a developer can create an edge application, test it locally, then with one command, deploy it to the Compute@Edge platform. When a request for that application reaches any of Fastly’s global network of edge servers, the application is launched and running in microseconds and can instantly scale to tens of thousands of requests per second.

It’s difficult to overstate the power and flexibility this puts in your hands as a developer—your application can be running on every edge server, with access to every attribute of its incoming requests, assembling responses in any way you choose. For an idea of the possibilities, check out the Compute@Edge demos, in particular, the implementation of the video game classic, “Doom.”

We don’t have space in a single blog post to explore an edge application of that magnitude, but read on for a simple example of how you can combine Fastly’s Compute@Edge with Backblaze B2 to improve your website’s user experience, directing requests to the optimal origin store end point based on the user’s location.

The Case for a Multiregion Origin Store

Although the CDN caches resources to improve performance, if a requested resource is not present in the edge server cache, it must be fetched from the origin store. When the edge server is close to the origin store, the increase in latency is minimal. If, on the other hand, the edge server is on a different continent from the origin store, it can take significantly longer to retrieve uncached content. In most cases, this additional delay is hardly noticeable, but for websites with many resources that are frequently updated, it can add up to a sluggish experience for users. A solution is for the origin store to maintain multiple copies of a website’s content, each at an end point in a different region. This approach can dramatically reduce the penalty for cache misses, improving the user experience.

There is a problem here, though: How do we ensure that a given CDN edge server directs requests to the “best” end point? The answer: build an application that uses the edge server’s location to select the end point. I’ll explain how I did just that, creating a Fastly Compute@Edge application to proxy requests to Backblaze B2 buckets.

Creating an Application on Fastly Compute@Edge

The Fastly Compute@Edge developer documentation did a great job of walking me through creating a Compute@Edge application. As part of the process, I had to choose a starter kit—a simple working application targeting a specific use case. The Static Content starter kit was the ideal basis for my application—it demonstrates many useful techniques, such as generating an AWS V4 Signature and manipulating the request’s Host HTTP header to match the origin store.

The core of the application is just a few lines written in the Rust programming language:

#[fastly::main]
 
fn main(mut req: Request) -> Result<Response, Error> {
// 1. Where is the application running?
let pop = get_pop(&req);

// 2. Choose the origin based on the edge server (pop) -
// default to US if there is no match on the pop
let origin = POP_ORIGIN.get(pop.as_str()).unwrap_or(&US_ORIGIN);

// 3. Remove the query string to improve cache hit ratio
req.remove_query();

// 4. Set the `Host` header to the bucket name + host rather than
// our Compute@Edge endpoint
let host = format!("{}.{}", origin.bucket_name, origin.endpoint);
req.set_header(header::HOST, &host);

// 5. Copy the modified client request to form the backend request
let mut bereq = req.clone_without_body();

// 6. Set the AWS V4 authentication headers
set_authentication_headers(&mut bereq, &origin);

// 7. Send the request to the backend and assign its response to `beresp`
let mut beresp = bereq.send(origin.backend_name)?;

// 8. Set a response header indicating the origin that we used
beresp.set_header("X-B2-Host", &host);

// 9. Return the response to the client
return Ok(beresp);
}

In step one, the get_pop function returns the three-letter abbreviation for the edge server, or point of presence (POP). For the purposes of testing, you can specify a POP as a query parameter in your HTTP request. For example, https://three.interesting.words.edgecompute.app/image.png?pop=AMS will simulate the application running on the Amsterdam POP. Next, in step two, the application looks up the POP in a mapping of POPs to Backblaze B2 end points. There are about a hundred Fastly POPs spread around the world; I simply took the list generated by running the Fastly command-line tool with the POPs argument, and assigned POPs to Backblaze B2 end points based on their location:

  • POPs in North America, South America, and Asia/Pacific map to the U.S. end point.
  • POPs in Europe and Africa map to the EU end point.

I won’t step through the rest of the logic in detail here—the comments in the code sample above cover the basics; feel free to examine the code in detail on GitHub if you’d like a closer look.

Serve Your Own Data From Multiple Backblaze B2 Regions

As you can see in the screenshot above, Fastly has implemented a Deploy to Fastly button. You can use this to create your own copy of the Backblaze B2 Compute@Edge demo application in just a couple of minutes. You’ll need to gather a few prerequisites before you start:

  • You must create Backblaze B2 accounts in both the U.S. and EU regions. If you have an existing account and you’re not sure which region it’s in, just take a look at the end point for one of your buckets. For example, this bucket is in the U.S. West region:

    To create your second account, go to the Sign Up page, and click the Region drop-down on the right under the big, red Sign Up button:

    Pick the region in which you don’t already have an account, and enter an email and password. Remember, your new account comes with 10GB of storage, free of charge, so there’s no need to enter your credit card details.

    Note: You’ll need to use a different email address from your existing account. If you don’t have a second email address, you can use the plus trick (officially known as sub-addressing) and reuse an existing email address. For example, if you used [email protected] for your existing B2 Cloud Storage account in the U.S. region, you can use [email protected] for your new EU account. Mail will be routed to the same inbox, and Backblaze B2 will be satisfied that it’s a different email address. This technique isn’t limited to Gmail, by the way, it works with many email providers.

  • Create a private bucket in each account, and use your tool of choice to copy the same data into each of them. Make a note of the end point for each bucket.
  • Create an application key with read access to each bucket.
  • Sign up for a free Fastly account if you don’t already have one. Right now, this includes free credits for Compute@Edge.
  • Sign up for a free GitHub account.
  • Go to the Backblaze B2/Fastly Compute@Edge Demo GitHub repository, click the Deploy to Fastly button, and follow the prompts. The repository will be forked to your GitHub account and then deployed to Fastly.
  • Important: There is one post-deploy step you must complete before your application will work! In your new GitHub repository, navigate to src/config.rs and hit the pencil icon near the top right to edit the file. Change the origin configuration in lines 18-31 to match your buckets and their end points. Alternatively, you can, of course, clone the repository to your local machine, edit it there, and push the changes back to GitHub.

Once you have your accounts and buckets created, it takes just a few minutes to deploy the application. Watch me walk through the process:

What Can You Do With Fastly’s Compute@Edge and Backblaze B2?

My simple demo application only scratches the surfaces of Compute@Edge. How could you combine Fastly’s edge computing platform with Backblaze B2 to create a new capability for your website? Check out Fastly’s collection of over 100 Compute@Edge code samples for inspiration. If you come up with something neat and share it on GitHub, let me know in the comments and I’ll round up a bundle of Backblaze-branded goodies, just for you!

The post Building a Multiregion Origin Store With Backblaze B2 + Fastly Compute@Edge appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Explore the Backblaze S3 Compatible API With Our New Postman Collection

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/explore-the-backblaze-s3-compatible-api-with-our-new-postman-collection/

Postman is a platform for building and using APIs. API providers such as Backblaze can use Postman to build API documentation and provide a live environment for developers to experiment with those APIs. Today, you can interact with Backblaze B2 Cloud Storage via our new Postman Collection for the Backblaze S3 Compatible API.

Using the Backblaze S3 Compatible API

The Backblaze S3 Compatible API implements the most commonly used S3 operations, allowing applications to integrate with Backblaze B2 in exactly the same way they do with Amazon S3. Many of our Alliance Partners have used the S3 Compatible API in integrating their products and services with Backblaze B2. Often, integration is as simple as allowing the user to specify a custom endpoint, for example, https://s3.us-west-001.backblazeb2.com, alongside their API credentials in the S3 settings, and verifying that the application works as expected with Backblaze B2.

The Backblaze B2 Native API, introduced alongside Backblaze B2 back in 2015, provides a low-level interface to B2 Cloud Storage. We generally recommend that developers use the S3 Compatible API when writing new applications and integrations, as it is supported by a wider range of SDKs and libraries, and many developers already have experience with Amazon S3. You can use the Backblaze B2 web console or the B2 Native API to access functionality, such as application key management and lifecycle rules, that is not covered by the S3 Compatible API.
 
Our post on the B2 Native and S3 Compatible APIs provides a more detailed comparison.

Most applications and scripts use one of the AWS SDKs or the S3 commands in the AWS CLI to access Backblaze B2. All of the SDKs, and the CLI, allow you to override the default Amazon S3 endpoint in favor of Backblaze B2. Sometimes, though, you might want to interact directly with Backblaze B2 via the S3 Compatible API, perhaps in debugging an issue, or just to better understand how the service works.

Exploring the Backblaze S3 Compatible API in Postman

Our new Backblaze S3 Compatible API Documentation page is the definitive reference for developers wishing to access Backblaze B2 directly via the S3 Compatible API.

In addition to reading the documentation, you can click the Run in Postman button on the top right of the page, log in to the Postman website or desktop app (creating a Postman account is free), and interact with the API.

Integrate With Backblaze B2

Whether you are backing up, archiving data, or serving content via the web, Backblaze B2 is an easy to use and, at a quarter of the cost of Amazon S3, cost-effective cloud object storage solution. If you’re not already using Backblaze B2, sign up now and try it out—your first 10GB of storage is free!

The post Explore the Backblaze S3 Compatible API With Our New Postman Collection appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Free Image Hosting With Cloudflare Transform Rules and Backblaze B2

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/free-image-hosting-with-cloudflare-transform-rules-and-backblaze-b2/

Before I dive into using Cloudflare Transform Rules to implement image hosting on Backblaze B2 Cloud Storage, I’d like to take a moment to introduce myself. I’m Pat Patterson, recently hired by Backblaze as chief developer evangelist. I’ve been working with technology and technical communities for close to two decades, at companies such as Sun Microsystems and Salesforce. I’ll be creating and delivering technical content for you, our Backblaze B2 community, and advocating on your behalf within Backblaze. Feel free to follow my journey and reach out to me via Twitter or LinkedIn.

Cloudflare Transform Rules

Now, on with the show! Cloudflare Transform Rules give you access to HTTP traffic at the CDN edge server, allowing you to manipulate the URI path, query string, and HTTP headers of incoming requests and outgoing responses. Where Cloudflare Workers allows you to write JavaScript code that executes in the same environment, Transform Rules give you much of the same power without the semi-colons and curly braces.

Let’s look at a specific use case: implementing image hosting on top of a cloud object store. Backblaze power user James Ross wrote an excellent blog post back in August 2019, long before the introduction of Transform Rules, explaining how to do this with Cloudflare Workers and Backblaze B2. We’ll see how much of James’ solution we can recreate with Transform Rules, without writing any code. We’ll also discover how the combination of Cloudflare and Backblaze allows you to create your own, personal 10GB image hosting site for free.

Implementing Image Hosting on a Cloud Object Store

James’ requirements were simple:

  • Serve image files from a custom domain, such as files.example.com, rather than the cloud storage provider’s domain.
  • Remove the bucket name, and any other extraneous information, from the URL.
  • Remove extraneous headers, such as the object ID, from the HTTP response.
  • Improve caching (both browser and edge cache) for images.
  • Add basic CORS headers to allow embedding of images on external sites.

I’ll work through each of these requirements in this blog post, and wrap up by explaining why Backblaze B2 might be a better long term provider for this and many other cloud object storage use cases than other cloud object stores.

It’s worth noting that nothing here is Backblaze B2-specific—the user’s browser is requesting objects from a B2 Cloud Storage public bucket via their URLs, just as it would with any other cloud object store. The techniques are exactly the same on Amazon S3, for example.

Prerequisites

You’ll need accounts with both Cloudflare and Backblaze. You can get started for free with both:

You’ll also need your own DNS domain, which I’ll call example.com in this article, on which you can create subdomains such as files.example.com. If you’ve read this far, you likely already have at least one. Otherwise, you can register a new domain at Cloudflare for a few dollars a year, or your local equivalent.

Create a Bucket for Your Images

If you already have a B2 Cloud Storage bucket you want to use for your image store, you can skip this section. Note: It doesn’t matter whether you created the bucket and its objects via the B2 Native API, the Backblaze S3 Compatible API, or any other mechanism—your objects are accessible to Cloudflare via their friendly URLs.

Log in to Backblaze, and click Buckets on the left under B2 Cloud Storage, then Create a Bucket. You will need to give your bucket a unique name, and make it public. Leave the other settings with their default values.

Note that the bucket name must be globally unique within Backblaze B2, so you can’t just call it something like “myfiles.” You’ll hide the bucket name from public view, so you can call it literally anything, as long as there isn’t already a Backblaze B2 bucket with that name.

Finally, click Upload/Download and upload a test file to your new bucket.

Click the file to see its details, including its various URLs.

In the next step, you’ll rewrite requests that use your custom subdomain, for example, https://files.example.com/smiley.png, to the friendly URL of the form, https://f004.backblazeb2.com/file/metadaddy-public/smiley.png.

Make a note of the hostname in the friendly URL. As you can see in the previous paragraph, mine is f004.backblazeb2.com.

Create a DNS Subdomain for Your Image Host

You will need to activate your domain (example.com, rather than files.example.com) in your Cloudflare account, if you have not already done so.

Now, in the Cloudflare dashboard, create your subdomain by adding a DNS CNAME record pointing to the bucket hostname you made a note of earlier.

I created files.superpat.com, which points to my bucket’s hostname, f004.backblazeb2.com.

If you test this right now by going to your test file’s URL in your custom subdomain, for example, https://files.example.com/file/my-unique-bucket-name/smiley.png, after a few seconds you will see a 522 “connection timed out” error from Cloudflare:

This is because, by default, Cloudflare accesses the upstream server via plain HTTP, rather than HTTPS. Backblaze only supports secure HTTPS connections, so the HTTP request fails. To remedy this, in the SSL/TLS section of the Cloudflare dashboard, change the encryption mode from “Flexible” to “Full (strict),” so that Cloudflare connects to Backblaze via HTTPS, and requires a CA-issued certificate.

Now you should be able to access your test file in your custom subdomain via a URL of the form https://files.example.com/file/my-unique-bucket-name/smiley.png. The next task is to create the first Transform Rule to remove /file/my-unique-bucket-name from the URL.

Rewrite the URL Path on Incoming Requests

There are three varieties of Cloudflare Transform Rules:

  • URL Rewrite Rules: Rewrite the URL path and query string of an HTTP request.
  • HTTP Request Header Modification Rules: Set the value of an HTTP request header or remove a request header.
  • HTTP Response Header Modification Rules: Set the value of an HTTP response header or remove a response header.

Click Rules on the left of the Cloudflare dashboard, then Transform Rules. You’ll see that the Cloudflare free plan includes 10 Transform Rules—plenty for our purposes. Click Create Transform Rule, then Rewrite URL.

It’s useful to pause for a moment and think about what we need to ask Cloudflare to do. Users will be requesting URLs of the form https://files.example.com/smiley.png, and we want the request to Backblaze B2 to be like https://f004.backblazeb2.com/file/metadaddy-public/smiley.png. We’ve already taken care of the domain part of the URL, so it becomes clear that all we need to do is prefix the outgoing URL with /file/<bucket name>.

Give your rule a descriptive name such as “Add file and bucket name.”

There is an opportunity to set a condition that incoming requests must match to fire the trigger. In James’ article, he tested that the path did not already begin with the /file/<bucket name> prefix, so that you can refer to a file with either the short or long URL.

At first glance, the Cloudflare dashboard doesn’t offer “does not start with” as an operator.

However, clicking Edit expression reveals a more powerful way of specifying the condition:

The Cloudflare Rules language allows us to express our condition precisely:

Moving on, Cloudflare offers static and dynamic options for rewriting the path. A static rewrite would apply the same value to the URL path of every request. This use case requires a dynamic rewrite, where, for each request, Cloudflare evaluates the value as an expression which yields the path.

Your expression would prepend the existing path with /file/<bucket name>, like this:

Save the Transform Rule, and try to access your test file again, this time without the /file/<bucket name> prefix in the URL path, for example: https://files.example.com/smiley.png.

You should see your test file, as expected:

Great! Now, let’s take a look at those HTTP headers in the response.

Remove HTTP Headers From the Response

You could use Chrome Developer Tools to view the response headers, but I prefer the curl command line tool. I used the --head argument to show the HTTP headers without the response body, since my terminal would not be happy with binary image data!

Note: I’ve removed some extraneous headers from this and subsequent HTTP responses for clarity and length.

% curl --head https://files.superpat.com/smiley.png
HTTP/2 200
date: Thu, 20 Jan 2022 01:26:10 GMT
content-type: image/png
content-length: 23889
x-bz-file-name: smiley.png
x-bz-file-id: 4_zf1f51fb913357c4f74ed0c1b_f1163cc3f37a60613_d20220119_m204457_c004_v0402000_t0044
x-bz-content-sha1: 3cea1118fbaab607a7afd930480670970b278586
x-bz-upload-timestamp: 1642625097000
x-bz-info-src_last_modified_millis: 1642192830529
cache-control: max-age=14400
cf-cache-status: MISS
last-modified: Thu, 20 Jan 2022 01:26:10 GMT

Our goal is to remove all the x-bz headers. Create a Modify Response Header rule and set its name to something like “Remove Backbaze B2 Headers.” We want this rule to apply to all traffic, so the match expression is simple:

Unfortunately there isn’t a way to tell Cloudflare to remove all the headers that are prefixed x-bz, so we just have to list them all:

Save the rule, and request your test file again. You should see fewer headers:

% curl --head https://files.superpat.com/smiley.png
HTTP/2 200
date: Thu, 20 Jan 2022 01:57:01 GMT
content-type: image/png
content-length: 23889
x-bz-info-src_last_modified_millis: 1642192830529
cache-control: max-age=14400
cf-cache-status: HIT
age: 1851
last-modified: Thu, 20 Jan 2022 01:26:10 GMT

Note: As you can see, for some reason Cloudflare does not remove the x-bz-info-src_last_modified_millis header. I’ve reported this to Cloudflare as a bug.

Optimize Cache Efficiency via the ETag and Cache-Control HTTP Headers

We can follow James’ lead in making caching more efficient by leveraging the ETag header. As explained in the MDN Web Docs for ETag:

The ETag (or entity tag) HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content was not changed.

Essentially, a cache can just request the HTTP headers for a resource and only proceed to fetch the resource body if the ETag has changed.

James constructed the ETag by using one of x-bz-content-sha1, x-bz-info-src_last_modified_millis, or x-bz-file-id, in that order. If none of those headers are set, then neither is ETag. It’s not possible to express this level of complexity in a Transform Rule, but we can apply a little lateral thinking to the problem. We can easily concatenate the three headers to create a result that will change when any one or more of them changes:

concat(http.response.headers["x-bz-content-sha1"][0],
http.response.headers["x-bz-info-src_last_modified_millis"][0],
http.response.headers["x-bz-file-id"][0])

Note that it’s possible for there to be multiple values of a given HTTP header, so http.response.headers["<header-name>"] is an array. http.response.headers["<header-name>"][0] yields the first, and in most cases only, element of the array.

Edit the Transform Rule you just created, update its name to something like “Remove Backblaze B2 Headers, set ETag,” and add a header with a dynamic value:

Don’t worry about the ordering; Cloudflare will reorder the operations so that “set” occurs before “remove.” Also, if none of those headers are present in the response, resulting in an empty value for the ETag header, Cloudflare will not set that header at all. Exactly the behavior we need!

Another test shows the result. Note that HTTP headers are not case-sensitive, so etag has just the same meaning as ETag:

% curl --head https://files.superpat.com/smiley.png
HTTP/2 200
date: Thu, 20 Jan 2022 02:01:19 GMT
content-type: image/png
content-length: 23889
x-bz-info-src_last_modified_millis: 1642192830529
cache-control: max-age=14400
cf-cache-status: HIT
age: 2198
last-modified: Thu, 20 Jan 2022 01:24:41 GMT
etag: 3cea1118fbaab607a7afd930480670970b27858616421928305294_zf1f51fb913357c4f74ed0c1b_f1163cc3f37a60613_d20220119_m204457_c004_v0402000_t0044

The other cache-related header is Cache-Control, which tells the browser how to cache the resource. As you can see in the above responses, Cloudflare sets Cache-Control to a max-age of 14400 seconds, or four hours.

James’ code, on the other hand, sets Cache-Control according to whether or not the request to B2 Cloud Storage is successful. For an HTTP status code of 200, Cache-Control is set to public, max-age=31536000, instructing the browser to cache the response for 31,536,000 seconds; in other words, a year. For any other HTTP status, Cache-Control is set to public, max-age=300, so the browser only caches the response for five minutes. In both cases, the public directive indicates that the response can be cached in a shared cache, even if the request contained an Authorization header field.

Note: We’re effectively assuming that once created, files on the image host are immutable. This is often true for this use case, but you should think carefully about cache policy when you build your own solutions.

At present, Cloudflare Transform Rules do not give access to the HTTP status code, but, again, we can satisfy the requirement with a little thought and investigation. As mentioned above, for successful operations, Cloudflare sets Cache-Control to max-age=14400, or four hours. For failed operations, for example, requesting a non-existent object, Cloudflare passes back the Cache-Control header from Backblaze B2 of max-age=0, no-cache, no-store. With this information, it’s straightforward to construct a Transform Rule to increase max-age from 14400 to 31536000 for the successful case:

Again, we need to use [0] to select the first matching HTTP header. Notice that this rule uses a static value for the header—it’s the same for every matching response.

We’ll leave the header as it’s set by B2 Cloud Storage for failure cases, though it would be just as easy to override it.

Another test shows the results of our efforts:

% curl --head https://files.superpat.com/smiley.png
HTTP/2 200
date: Thu, 20 Jan 2022 02:31:38 GMT
content-type: image/png
content-length: 23889
x-bz-info-src_last_modified_millis: 1642192830529
cache-control: public, max-age=31536000
cf-cache-status: HIT
age: 4017
last-modified: Thu, 20 Jan 2022 01:24:41 GMT
etag: 3cea1118fbaab607a7afd930480670970b27858616421928305294_zf1f51fb913357c4f74ed0c1b_f1163cc3f37a60613_d20220119_m204457_c004_v0402000_t0044

Checking the failure case—notice that there is no ETag header, since B2 Cloud Storage did not return any x-bz headers:

% curl --head https://files.superpat.com/badname.png
HTTP/2 404
date: Thu, 20 Jan 2022 02:32:35 GMT
content-type: application/json;charset=utf-8
content-length: 94
cache-control: max-age=0, no-cache, no-store
cf-cache-status: BYPASS

Success! Browsers and caches will aggressively cache responses, reducing the burden on Cloudflare and Backblaze B2.

Set a CORS Header for Image Files

We’re almost done! Our final requirement is to set a cross-origin resource sharing (CORS) header for images so that they can be manipulated in web pages from any domain on the web.

The Transform Rule must match a range of file extensions, and set the Access-Control-Allow-Origin HTTP response header to allow any webpage to access resources:

Upload a text file and run a final couple of tests to see the results. First, the image:

% curl --head https://files.superpat.com/smiley.png
HTTP/2 200
date: Thu, 20 Jan 2022 02:50:52 GMT
content-type: image/png
content-length: 23889
x-bz-info-src_last_modified_millis: 1642192830529
cache-control: public, max-age=31536000
cf-cache-status: HIT
age: 4459
last-modified: Thu, 20 Jan 2022 01:36:33 GMT
etag: 3cea1118fbaab607a7afd930480670970b27858616421928305294_zf1f51fb913357c4f74ed0c1b_f1163cc3f37a60613_d20220119_m204457_c004_v0402000_t0044
access-control-allow-origin: *

The Access-Control-Allow-Origin header is present, as expected.

Finally, the text file, without an Access-Control-Allow-Origin header. You can use the --include argument rather than --head to see the file content as well as the headers:

% curl --include https://files.superpat.com/hello.txt
HTTP/2 200
date: Thu, 20 Jan 2022 02:48:51 GMT
content-type: text/plain
content-length: 14
accept-ranges: bytes
x-bz-info-src_last_modified_millis: 1642646740075
cf-cache-status: DYNAMIC
etag: 60fde9c2310b0d4cad4dab8d126b04387efba28916426467400754_zf1f51fb913357c4f74ed0c1b_f1092902424a40504_d20220120_m024635_c004_v0402003_t0000

Hello, World!

Troubleshooting

The most frequent issue I encountered while getting all this working was mixing up request and response when referencing HTTP headers. If things are not working as expected, double check that you don’t have http.response.headers["<header-name>"] where you need http.request.headers["<header-name>"] or vice versa.

Can I Really Do This Free of Charge?

Backblaze B2 pricing is very simple:

Storage
  • The first 10GB of storage is free of charge.
  • Above 10GB, we charge $0.005/GB/month, around a quarter of the cost of other leading cloud object stores (cough, S3, cough).
  • Storage cost is calculated hourly, with no minimum retention requirement, and billed monthly.
Downloaded Data
  • The first 1GB of data downloaded each day is free.
  • Above 1GB, we charge $0.01/GB, but…
  • Downloads through our CDN and compute partners, of which Cloudflare is one, are free.
Transactions
  • Each download operation counts as one class B transaction.
  • The first 2,500 class B transactions each day are free.
  • Beyond 2,500 class B transactions, they are charged at a rate of $0.004 per 10,000.
No Surprise Bills
  • If you already signed up for Backblaze B2, you might have noticed that you didn’t have to provide a credit card number. Your 10GB of free storage never expires, and there is no chance of you unexpectedly incurring any charges.

By serving your images via Cloudflare’s global CDN and optimizing your cache configuration as described above, you will incur no download costs from B2 Cloud Storage, and likely stay well within the 2,500 free download operations per day. Similarly, Cloudflare’s free plan does not require a credit card for activation, and there are no data or transaction limits.

Sign up for Backblaze B2 today, deploy your own personal image host, explore our off-the-shelf integrations, and consider what you can create with an affordable, S3-compatible cloud object storage platform.

The post Free Image Hosting With Cloudflare Transform Rules and Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.