Tag Archives: B2Cloud

AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/ai-reasoning-models-openai-o3-mini-o1-mini-and-deepseek-r1/

A decorative image showing an AI chip connecting icons of representing different files.

If you haven’t been able to keep pace with the AI news cycle, you’d be forgiven. I work at a tech company, and it’s felt like bailing water with a teacup over the past few weeks. But the term that keeps rising to the top of the flotsam in the boat is this: reasoning models. The regular ol’ models that power ChatGPT, Gemini, and Claude are cool and all, but reasoning models are what you should keep an eye on as an enterprise tech leader, specifically DeepSeek and OpenAI.

In the spirit of our AI 101 series, I’ll do my level best to recap the finer points and decode some of the more esoteric terms you’re likely to encounter (Like: WTH is a “mixture of experts”? That sounds like a party I want to be invited to, but will definitely skip at the last minute.)

The reasoning model releases: OpenAI o1-mini, DeepSeek R1, and OpenAI o3-mini

The last few weeks and months have seen a flurry of activity in the AI space, with reasoning models taking center stage. The TL/DR is that reasoning models are LLMs that can self-correct before delivering a response to a prompt, though their turn time is a little longer than your standard LLM. 

Here are the releases that you should know about.

OpenAI o1-mini: September 12, 2024

It seems like a lifetime ago, but OpenAI released its o1-mini model back in September. o1-mini wasn’t the first reasoning model to go to market (models from Google, DeepMind, Anthropic, and Meta dabbled in reasoning for specific tasks). But, it was more cost-efficient at inference—80% cheaper than the o1-preview model. What you need to know:

  • Yes, o1-preview and o1-mini were released at the same time—it’s confusing. Without getting into the weeds, here’s the difference: pricing. o1-preview was the most expensive OpenAI model on offer at $15/1M input tokens and $60/1M output tokens versus mini’s $3/1M input and $12/1M output. (You can think of tokens as units of data, like a prompt or a response, that are processed by the ML model.)
  • o1-preview (the expensive one) was purported at the time to perform “similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”
  • o1-mini (the 80% cheaper one) was designed to be particularly well-suited for coding tasks.

DeepSeek R1: January 20, 2025

Unless you’ve been under a rock, you’ve heard about this one. DeepSeek rattled the AI industry and financial markets with its release of R1, challenging OpenAI’s models on performance, pricing, and open-source availability. (We love a good open-source release.) What you need to know:

  • DeepSeek R1 delivers comparable results to OpenAI’s o1 models, both preview and mini, on math and coding benchmarks, while being trained on fewer GPUs—orders of magnitude fewer. Best guess estimates put it at around 60,000 GPUs, while industry leaders like OpenAI and Anthropic exceed 500k each.
  • This makes R1 much cheaper at $0.14/1M input tokens and $2.19/1M output tokens.
  • These efficiency claims could have far-reaching impacts for enterprises looking to build AI at a fraction of the cost. (The DeepSeek platform page has been down since we tasked one of our favorite tech evangelists with testing it, but stay tuned for a deep dive on how it works.)

OpenAI o3-mini: January 31, 2025

OpenAI previewed o3 in December, and brought it to GA just 11 days after DeepSeek joined the party. What you need to know:

I’m admittedly cherry picking these releases a bit to keep things simple. Suffice it to say, there are a lot of models, even within OpenAI’s o-series, but these are the ones of note at least as it pertains to recent events. 

What is reasoning anyway?

You might see reasoning described as “thinking” before it delivers an answer, but do not be fooled. AI cannot yet “think” or, to be fair, “reason” in the ways that we apply those terms to humans. To describe what they actually do, I need to use a word salad of jargon. I’m sorry—definitions follow. Reasoning models leverage chain-of-thought prompting to guide decision-making, incorporating self-improvement mechanisms and using test-time thinking to make real-time adjustments.

  • Chain-of-thought (CoT) prompting: Models break problems into logical steps (e.g., solving math problems via intermediate equations)
  • Self-improvement mechanisms: Techniques like the Self-Taught Reasoner (STaR) enable iterative refinement of reasoning through automated feedback loops.
  • Test-time thinking: Models can make decisions during deployment based on real-time inputs, rather than relying solely on pre-trained models or fixed strategies.

Here are a few more terms you might come across for good measure: 

  • Inference compute: The computational power needed to run a reasoning model and generate predictions or outputs based on new data after the model has been trained.
  • Mixture of experts approach: Using multiple specialized models (“experts”) that handle different tasks, and applying a gating mechanism to select the most relevant expert to use to make predictions based on the input data. Of note: DeepSeek used this approach to create efficiencies.
  • Distillation: Using inputs and outputs from one model to train another model. Of note: OpenAI alleges this is how DeepSeek “stole” its IP.

This is all pretty cool, if linguistically painful, stuff, and it means that reasoning models are shifting perceptions of model capabilities. But they’re not without persistent challenges. Like other LLMs, they still struggle with complex reasoning failures, lack of training transparency, and cognitive biases.

Why should you care?

If the past two weeks (and, really, the past two years) are any indication, AI innovation will continue its blistering pace. Reasoning models, and LLMs in general, will become diverse and specialized for narrower tasks as the core technology is increasingly commoditized and cheapened. And, it’s worth noting that this is a totally normal—and expected—lifecycle when it comes to new technology. 

What does it all mean for enterprises looking to build AI into their operations? Two key takeaways:

  • Don’t overcommit on any one toolset or investment: Test out OpenAI, DeepSeek, Gemini, Alibaba’s Qwen, and others. And, stay ahead of the changing landscape and new models—stay nimble, and keep experimenting. 
  • Take care of your data: What makes these models valuable for your company isn’t so much their capabilities, but your data. You need to retain it in storage that’s reliable, easy to access, and doesn’t lock you out of AI experimentation with exorbitant egress fees. 

Even as AI models get better, having those fundamentals in place can only help your business and set you up to better leverage AI when it’s right for your operations.

The post AI Reasoning Models: OpenAI o3-mini, o1-mini, and DeepSeek R1 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Building an AI Agent with Backblaze B2, LangChain, and Drive Stats

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/building-an-ai-agent-with-backblaze-b2-langchain-and-drive-stats/

A decorative image showing multiple computer windows folding into the cloud.

Last August, I explained how you can use a Jupyter Notebook to explore AI development; specifically, building a chatbot that answers questions based on custom context downloaded from a private bucket in Backblaze B2 Cloud Storage. 

In this post, I’ll look at another AI technology, agents, and show you how I built an AI agent that answers questions about hard drive reliability based on over 11 years of raw data from our Drive Stats franchise. 

The Drive Stats dataset is ideal for this kind of work. It’s a real-world dataset, but, it only weighs in at around 500 million records consuming about 20GB of storage in Parquet format (“only” being a relative term), so you can use it with big data and AI tools on a laptop in a reasonable amount of time rather than spinning up an expensive virtual machine (VM) and/or spending days waiting for an operation to complete. As an example, converting the entire Drive Stats data set from CSV to Parquet using a Python app on my MacBook Pro takes a couple of hours. On the same hardware, converting a terabyte-scale data set would take about four days.

Speaking of Drive Stats

The Drive Stats 2024 report comes out February 11, and we’re hosting a LinkedIn Live event where Andy Klein, resident Drive Stats guru, will share highlights. Register today to save your spot.

Register Here ➔ 

You can use these same techniques with any large dataset, from healthcare to ecommerce to financial services. In this example, we’re working with a single table, but you could adapt the sample code to a data lake comprising any number of tables.

What is an AI agent?

In the spirit of the times, I posed this question to ChatGPT. Its answer:

An AI agent is a software system designed to autonomously perform tasks or make decisions based on its environment and goals. It leverages artificial intelligence techniques—such as machine learning, reasoning, and natural language processing—to process information, make decisions, and take actions to achieve specific objectives.

Key components of an AI agent include:

  1. Perception: The ability to sense and understand its environment. This could be through sensors, input data, or other means of gathering information.
  2. Reasoning/decision-making: The core processing mechanism that helps the agent interpret its environment, make decisions, and plan actions. It could use various algorithms, such as decision trees, reinforcement learning, or neural networks.
  3. Action: Once the agent has analyzed the environment and made a decision, it takes action to achieve its goal, whether it’s performing an operation, giving a recommendation, or interacting with another system.
  4. Learning: Some AI agents can adapt over time, improving their decision-making and actions based on experience (via reinforcement learning, supervised learning, etc.).

AI agents can range from simple systems, like chatbots or virtual assistants, to more complex systems like autonomous vehicles, robots, or financial trading algorithms.

In general, the term “agent” emphasizes the idea of autonomy—the agent operates independently, often with the ability to learn, adapt, and make decisions based on changing conditions without direct human intervention.

In this example, the agent’s environment is a database containing the Drive Stats data (more on that below), and I want it to perform the following tasks:

  • Based on a natural language question, such as “Which drive has the lowest annual failure rate?”, generate a SQL query that retrieves data that will help answer the question.
  • Execute that query against the Drive Stats dataset.
  • Based on the query results, either create a new query that better answers the question, or generate a natural language answer.

As in my previous post, I’m using the open source LangChain framework. This tutorial on building a question/answering system over SQL data was my starting point. I’ll explain key points of the integration in this blog post; the full source code is available as a Jupyter notebook in the ai-agent-demo repository.

Querying the Drive Stats dataset

Now I’ve established that my agent will be writing a SQL query, the next question is, “What will it be querying?” I’ve written about querying the Drive Stats dataset before; in that blog post I explained how I wrote a Python script to convert the Drive Stats data from the CSV format in which we publish it to Apache Parquet, a column-oriented file format particularly well-suited for storing tabular data for use in analytical queries, and upload it to a Backblaze B2 Bucket using the Apache Hive table format. There’s a broad ecosystem of tools and platforms that can manipulate Parquet data in object storage (for example, Apache Spark and Snowflake) and I chose Trino, the open source distributed SQL engine that forms the basis for Amazon Athena, to execute queries against the data.

I could have used the same technologies for this exercise, but I decided to add Apache Iceberg to the mix. While Parquet is a file format that specifies how tabular data is stored in files, Iceberg is a table format that governs how those files can be combined and interpreted as a database table. Iceberg provides a number of advantages over Hive as a table format, including better performance and much more flexible data partitioning.

What is partitioning?

Partitioning splits a dataset on one or more column values, easing data management and improving performance when a query includes a partition column.

Partitioning by year and month makes sense for the Drive Stats dataset—the resulting Parquet files are in the hundreds of megabytes, the sweet spot for Parquet data. To apply this partitioning to the Drive Stats data using the Hive table format, I had to create otherwise redundant month and year columns from the existing date column, complicating the schema.

Iceberg, by contrast, supports hidden partitioning, allowing you to apply a transformation to a column value to produce a partition value without adding any new columns. With the Drive Stats data, that meant I could simply define the partitioning as month(date) (the resulting value being the number of months since 1/1/1970, rather than an integer between 1 and 12), with no need to create any additional columns.

LangChain’s SQLDatabase class provides access to databases via the SQLAlchemy  open-source Python library. The demo code obtains a SQLDatabase instance by providing a URI containing the trino scheme, a username and the location of the database node:

db = SQLDatabase.from_uri('trino://admin@localhost:8080/iceberg/drivestats')

Note: In this and other code excerpts in this blog post, I’ve omitted extraneous “boilerplate” code. As mentioned above, the full source code is available in the ai-agent-demo repository.

As you can infer from the localhost domain name, I’m running Trino on my laptop. I’m actually running it in Docker, using the Iceberg/Hive Docker Compose script from the trino-getting-started-b2 repository. I’ll dive into that example in a future blog post.

A simple query confirms that we have a successful database connection:

db.run("SELECT COUNT(*) FROM drivestats")

'[(537220724,)]'

As the result conveys, there are over 537 million rows in the Drive Stats dataset.

Each row contains the metrics collected from a single drive in the Backblaze fleet on a specific day. The schema has evolved over time, but, currently, the following columns are included:

  • date: The date of collection.
  • serial_number: The unique serial number of the drive.
  • model: The manufacturer’s model number of the drive.
  • capacity_bytes: The drive’s capacity in bytes.
  • failure: 1 if this was the last day that the drive was operational before failing, 0 if all is well.
  • pod_slot_num: The physical location of a drive within a storage server, as an integer from 0 to 59. The specific slot differs based on the storage server type and capacity: Backblaze (45 or 60 drives), Dell (26 drives), or Supermicro (60 drives).
  • pod_id: There are 20 storage servers in each Backblaze Vault. The pod_id is a numeric field with values from 0 to 19 assigned to each of the 20 storage servers.
  • vault_id: All data drives are members of a Backblaze Vault. Each Vault consists of either 900 or 1,200 hard drives divided evenly across 20 storage servers. The Vault is a numeric value starting at 1,000.
  • cluster_id: The name of a given collection of storage servers logically grouped together to optimize system performance, formatted as a numeric field with up to two digits. Note: At this time the cluster_id is not always correct; we are working on fixing that.
  • datacenter: The Backblaze data center where the drive is installed, currently one of ams5 (Amsterdam, Netherlands), iad1 (Reston, Virginia), phx1 (Phoenix, Arizona), sac0 (Sacramento, California), sac2 (Stockton, California) or, now live, yyz1, our new Toronto, Ontario, data center.
  • is_legacy_format: Currently 0, but may change in future as more fields are added.
  • A collection of SMART attributes. The number of attributes collected has risen over time; currently we store 93 SMART attributes in each record, each one in both raw and normalized form, with field names of the form smart_n_normalized and smart_n_raw, where n is between 1 and 255.

Using OpenAI to generate a SQL query

For this project, I decided to use the OpenAI API, rather than running a large language model (LLM) directly on my laptop. LangChain has a chat model integration for OpenAI, as well as many other providers, so you could use, for example, a local Llama model (via ChatOllama) or one of the Claude models (via ChatAnthropic) if you prefer.

To use the OpenAI API, you must sign up for an OpenAI account and create an OpenAI API key. This code loads the API key from a .env file and creates a chat model instance using OpenAI’s GPT-4o mini model:

# OPENAI_API_KEY must be defined in the .env file
load_dotenv()
llm = ChatOpenAI(model="gpt-4o-mini")

Now we need a system prompt template. We’ll combine this with the database schema and a natural language question to form the prompt that we send to OpenAI. As in the LangChain tutorial, I’m using a prompt from the LangChain Prompt Hub:

query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
query_prompt_template.messages[0].pretty_print()

This is the prompt template text, with the placeholders shown in {braces}:

================================ System Message ================================

Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
{table_info}

Question: {input}

Notice how the template requires you to specify the correct SQL dialect, constrains the number of results returned, and encourages the model to not hallucinate column names that do not exist in the schema.

A helper function populates the prompt template, sends it to the model, and returns the generated SQL query:

def write_query(state: State):
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": state["question"],
        }
    )
    structured_llm = llm.with_structured_output(QueryOutput)
    result = structured_llm.invoke(prompt)
    return {"query": result["query"].rstrip(';')}

We can test the helper function by calling it directly with a Python dictionary containing a simple question:

question = {"question": "How many drives are there?"}
query = write_query(question)

The resulting query dictionary does indeed contain a valid SQL query, but it won’t give us the answer we are looking for.

{'query': 'SELECT COUNT(*) AS drive_count FROM drivestats'}

That query will tell us how many rows there are in the dataset, rather than how many drives. We supplied the database schema to the model, but we haven’t given it any information on the semantics of the columns in the drivestats table. We can provide a bit more detail to obtain the correct query:

question = {"question": "Each drive has its own serial number. How many drives are there?"}
query = write_query(question)

This time, the generated SQL query is correct:

{'query': 'SELECT COUNT(DISTINCT serial_number) AS total_drives FROM drivestats'}

As you can see, it’s important to check the output of AI models—they can and do generate unexpected results.

A second helper function executes the query against the database:

def execute_query(state: State):
    execute_query_tool = QuerySQLDatabaseTool(db=db)
    return {"result": execute_query_tool.invoke(state["query"])}

We can test it using the (correct) generated query:

result = execute_query(query)

{'result': '[(430464,)]'}

We need one more helper function, to pass the result set to the model and have it generate a natural language response. This time, we define our own prompt:

def generate_answer(state: State):
    prompt = (
        "Given the following user question, corresponding SQL query, "
        "and SQL result, answer the user question.\n\n"
        f'Question: {state["question"]}\n'
        f'SQL Query: {state["query"]}\n'
        f'SQL Result: {state["result"]}'
    )
    response = llm.invoke(prompt)
    return {"answer": response.content}

Again, we can test it in isolation. Notice that we have to provide the question and query, as well as the result so that the model has the context it needs:

answer = generate_answer(question | query | result)
answer['answer']

'There are 430,464 drives.'

Success! At the present time, there are indeed 430,464 drives in the Drive Stats dataset.

LangChain’s LangGraph orchestration framework allows us to compile our three helper functions into a single graph object:

graph_builder = StateGraph(State).add_sequence(
    [write_query, execute_query, generate_answer]
)
graph_builder.add_edge(START, "write_query")
graph = graph_builder.compile()

We can visualize the flow in the notebook:

display(Image(graph.get_graph().draw_mermaid_png()))
A diagram showing a query workflow. The workflow is defined as start, write_query, execute_query, generate_answer.

We’ve combined the write_query and execute_query steps into a graph object that can run agent-generated queries. I’ll quote the security note from the LangChain tutorial on the inherent risks in doing so:

Building Q&A systems of SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent’s needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, see here.

In this example, we are querying a public dataset, and I followed best practice by configuring Trino’s Iceberg connector with a read-only application key scoped to the bucket containing the Drive Stats Iceberg tables.

Now let’s stream a new question through the flow. This mode of operation displays the output of each step as it is executed, essential for understanding the flow’s behavior, particularly when it is behaving unexpectedly. The model returns structured text in Markdown format. With a couple of lines of code to extract the message from the step variable, we can use the display_markdown function to render each step’s output:

for step in graph.stream(
    {"question": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024"}, stream_mode="updates"
):
    # unwrap the step value to get the markdown message
    state = one(step.values())
    message = one(state.values())
    display_markdown(message, raw=True)

This is the model’s output, and it gives us three different messages. I’ve separated them with a horizontal line for clarity:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10


Error: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message=”line 3:12: Cannot apply operator: date = varchar(10)”, query_id=20250113_221649_00214_bsut5) [SQL: SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10] (Background on this error at: https://sqlalche.me/e/20/f405)


The SQL query encountered an error because it attempted to compare a date column with a string in the format ‘YYYY-MM-DD’. Specifically, the error message indicates that the date column is of a different type (likely not a string), which is causing the type mismatch.

To answer the user question about how many drives each data center had on 9/1/2024, the SQL query needs to be corrected. Here’s the revised query:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count 
FROM drivestats 
WHERE date = DATE '2024-09-01' 
GROUP BY datacenter 
ORDER BY drive_count DESC 
LIMIT 10

This corrected query uses the DATE keyword to ensure that the date string is properly interpreted as a date type, which should resolve the type mismatch error. If executed successfully, this query will provide the count of distinct drives in each data center for the specified date. However, without executing the corrected query against the database, we cannot provide the exact counts.

As you can see in the output from the first step, the model generated an invalid query, comparing a date to a string, despite the database schema being included in the prompt. The output of the second step contains the resulting error message from the database, while the third step contains the model’s diagnosis of the error.

This exchange highlights a limitation of a flow that is simply a linear series of steps, such as write_query, execute_query, and generate_answer. We cannot rely on the model to generate a valid SQL query, although it is able to point the way towards resolving its error.

Creating a ReAct AI agent with LangGraph

The LangGraph framework gives you the capability to create AI agents based on arbitrarily complex logic. In this article, I’ve used its prebuilt ReAct (Reason+Act) agent, since it neatly demonstrates the agent concept, rewriting the SQL query repeatedly in response to database errors.

There are three steps to creating the agent. The first is to create an instance of LangChain’s SQLDatabaseToolkit, passing it the database and model, and obtain its list of tools:

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

The tools list contains tools that execute queries, retrieve the names, schemas and content of database tables, and check SQL query syntax.

The next step is to retrieve a suitable prompt template from the Prompt Hub and populate the template placeholders:

prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")
system_message = prompt_template.format(dialect=db.dialect, top_k=10)

Here is the prompt template’s text:

================================ System Message ================================

You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

To start you should ALWAYS look at the tables in the database to see what you can query.
Do NOT skip this step.
Then you should query the schema of the most relevant tables.

Now we can create an instance of the prebuilt agent:

agent_executor = create_react_agent(llm, tools, 
state_modifier=system_message)

Note how the agent must select the next step, and how the flow can cycle between the agent and tools steps:

display(Image(agent_executor.get_graph().draw_mermaid_png()))
A diagram showing the workflow between tools and agent. The workflow is as follows: start, agent, then a split option to access tools (a recursive step), or to end. The diagram shows that after agent, you can optionally select tools or end, indicating that you can end without choosing tools.

Again, we can stream the agent’s execution to show us each step of its operation.

for step in agent_executor.stream(
    {"messages": [{"role": "user", "content": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024?"}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

The output from this flow is over 300 lines long; I posted it in its entirety as a Gist, but I’ll summarize the steps here:

  • Question: Each drive has its own serial number. How many drives did each data center have on 9/1/2024?
  • The model calls the “list tables” tool.
  • The list tables tool responds with a single table name, drivestats.
  • The model calls the “get schema” tool, passing it the table name.
  • The get schema tool responds with the schema and three sample rows from the drivestats table.
  • The model submits a query to the “query checker” tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10;
  • The query checker responds with the checked query, which is the same as its input. Note that the query checker only checks the SQL query’s syntax. The query contains the same data type mismatch as the query we generated earlier, as well as another error, as we’re about to discover.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a syntax error—Trino does not allow a trailing semi-colon on the query.
  • The model submits a modified query to the query checker tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
  • The query checker responds with the checked query, which is the same as its input.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a type mismatch error since the query tries to compare a string value with a date column.
  • The model submits a query with the necessary DATE type identifier to the query checker tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = DATE '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
  • The query checker responds with the checked query, which is the same as its input.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a result set:
    [
      ('phx1', 89477),
      ('sac0', 78444),
      ('sac2', 60775),
      ('', 24080),
      ('iad1', 22800),
      ('ams5', 16139)
    ]
  • The model returns a message containing the answer:

On September 1, 2024, the following datacenters had the specified number of drives:

1. phx1: 89,477 drives
2. sac0: 78,444 drives
3. sac2: 60,775 drives
4. (unknown datacenter): 24,080 drives
5. iad1: 22,800 drives
6. ams5: 16,139 drives

These results show the datacenters with their respective drive counts.

Now let’s see if the model can calculate the annualized failure rate of a drive model. We’ll use the Seagate ST4000DM000, just because that is the drive model with the most days of operation in the dataset.

for step in agent_executor.stream(
        {"messages": [{"role": "user", "content": "Each drive has its own serial number. What is the annualized failure rate of the ST4000DM000 drive model?"}]},
        stream_mode="values",
):
    step["messages"][-1].pretty_print()

The agent’s response mixes Markdown and LaTex notation. I used QuickLaTeX to render the LaTex to images:

The annualized failure rate (AFR) for the ST4000DM000 drive model can be calculated using the following information:

– Total failures: 5,791

– Total drives: 37,040

– Time period: from May 10, 2013, to September 30, 2024, which is approximately 11.35 years.

The formula for calculating the annualized failure rate is:

The calculation for the annualized failure rate. It's total failures divided by total drives, multiplied by one over the total years, multiplied by 100.

Plugging in the numbers:

Real number for the annualize failure rate calculations. In this instance, the text reads 5791 divided by 37040, multiplied by one over 11.35, multiplied by 100, which equals approximately 13.77 percent.

Therefore, the annualized failure rate (AFR) of the ST4000DM000 drive model is approximately 13.77%.

It’s impressive that the agent shows its working so comprehensively, but, unfortunately, it arrives at the wrong answer. Those drives were not all running for the entire span of the Drive Stats dataset. The correct calculation involves determining the number of days with data for those drives and dividing it by 365 to get the correct number of years’ operation.

It’s clear that the model is not able to answer questions on drive reliability given the data available to it so far. The solution lies in prompt engineering—providing more context on the semantics of the data in the system prompt.

We can extend the default AI agent system prompt template to include specific instructions on working with the Drive Stats dataset:

prompt_template.messages[0].prompt.template += """
Each row of the drivestats table records one day of a drive’s operation, and contains the serial number of a drive, its model name, capacity in bytes, whether it failed on that day, SMART attributes and identifiers for the slot, pod, vault, cluster and data center in which it is located.

Use this calculation for the annualized failure rate (AFR) for a drive model over a given time period:

1. **drive_days** is the number of rows for that model during the time period.
2. **failures** is the number of rows for that model during the time period where **failure** is equal to 1.
3. **annual failure rate** is 100 * (**failures** / (**drive_days** / 365)).

Use double precision arithmetic in the calculation to avoid truncation errors. To convert an integer **i** to a double, use CAST(**i** AS DOUBLE)

Note that the date column is a DATE type, not a string. Use the DATE type identifier when comparing the date column to a string.

Do not add a semi-colon suffix to SQL queries."""

Now, when we ask the same question on the annual failure rate of the ST4000DM000 drive model, the AI agent generates a correct SQL query and a more concise, and correct, final response (you can inspect the full output here).

SELECT 100 * (CAST(COUNT(CASE WHEN failure = 1 THEN 1 END) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST4000DM000'

The annual failure rate (AFR) for the ST4000DM000 drive model is approximately 2.63%.

Let’s ask the AI agent for a statistic that we can corroborate from the Backblaze Drive Stats for Q3 2024 blog post.

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "What was the annual failure rate of the ST8000NM000A drive model in Q3 2024?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

The query makes sense, and the response agrees with the table in the blog post:

SELECT 100 * (CAST(SUM(failure) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST8000NM000A' AND date >= DATE '2024-07-01' AND date < DATE '2024-10-01'

The annual failure rate (AFR) of the ST8000NM000A drive model in Q3 2024 is approximately 1.61%.

Interestingly, this time the SQL query used SUM(failure) to count the number of failures, rather than the equivalent, but rather long-winded COUNT(CASE WHEN failure = 1 THEN 1 END) it used in the previous query. Also, looking at the full response, we can see that, as directed by the custom prompt, the agent generated the correct syntax for comparing dates, so it didn’t need to correct and retry any queries.

Finally, let’s ask a more convoluted question, including the constraints given in the blog post:

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "Considering only drive models which had at least 100 drives in service at the end of the quarter and which accumulated 10,000 or more drive days during the quarter, which drive had the most failures in Q3 2024, and what was its failure rate?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

Again, the AI agent is able to generate a valid SQL query, this time including a subquery, and its response matches the data from the blog post exactly:

WITH drive_stats AS (
    SELECT model,
           COUNT(DISTINCT serial_number) AS drive_count,
           COUNT(*) AS drive_days,
           COUNT(CASE WHEN failure = 1 THEN 1 END) AS failures
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date < DATE '2024-10-01'
    GROUP BY model
    HAVING COUNT(DISTINCT serial_number) >= 100 AND COUNT(*) >= 10000
)
SELECT model,
       failures,
       100 * (CAST(failures AS DOUBLE) / (CAST(drive_days AS DOUBLE) / 365)) AS failure_rate
FROM drive_stats
ORDER BY failures DESC
LIMIT 10

The drive model with the most failures in Q3 2024 is the TOSHIBA MG08ACA16TA, which had 181 failures. Its failure rate during this period was approximately 1.84%.

Closing thoughts

My experience building an AI agent was astonishment at its ability to correctly generate quite complex SQL queries based on natural language instructions, tempered with frustration at its limitations, particularly the way that it would confidently generate an incorrect response, rather than saying “I’m sorry, but I don’t know how to do that.” Your AI agent development process should include generous testing time, as well as ongoing monitoring to ensure that it is coming up with the right answers.

The post Building an AI Agent with Backblaze B2, LangChain, and Drive Stats appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Video Surveillance Storage in the Cloud

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/video-surveillance-storage-in-the-cloud/

A decorative image showing several video surveillance cameras connected to the cloud.

If you manage a video surveillance platform, storage is either the unsung hero of your infrastructure or your biggest headache. Maybe what started as a neat and functional setup for your video surveillance storage has turned into a complex system that demands far more maintenance than you expected. 

As your storage volume grows and costs climb, the initial solutions begin to show their cracks, demanding more resources and attention. And, higher resolution cameras lead to even more storage demand, not to mention the increase in the number of cameras your customers want to install. 

Today, I’m outlining some of the obstacles I’ve seen companies encounter in video surveillance storage as well as the benefits cloud storage offers to help you streamline operations, rein in costs, and regain control over your architecture, without sacrificing performance or flexibility.

Video surveillance storage challenges

Storage infrastructure is the backbone of your service—but it can also be one of the most demanding aspects to manage whether your data is stored in on-premises NVR/DVR systems, in the cloud, or in a hybrid model. Some of the key challenges include explosive data growth in the industry, balancing cost and performance, regulatory and compliance hurdles, latency and accessibility, and data security and redundancy. I’ll dig into each, and talk about some of the pitfalls you might face. 

  • Explosive data growth: The sheer volume of video data generated by modern surveillance systems is staggering. With cameras capturing high-definition footage 24/7, even modest setups can produce terabytes of data per day. Scaling storage to keep up with this growth without compromising performance or breaking the bank is no small feat.
  • Balancing cost and performance: As storage needs grow, so do bandwidth requirements, data egress, and ongoing maintenance costs. Striking the right balance between affordability and the high availability your customers expect is a constant juggling act.
  • Regulatory and compliance hurdles: Platforms must manage varying retention policies, ensure data privacy, and provide secure access controls to adhere to all relevant standards and requirements, such as GDPR and local video retention regulations—all while keeping everything running smoothly.
  • Latency and accessibility: Your customers demand quick access to their video footage, whether it’s for live viewing or playback. High latency or sluggish retrieval times can lead to frustration and impact trust in your platform. Designing storage that ensures fast, reliable access to data across geographies is critical, but challenging.
  • Data security and redundancy: Video footage isn’t just data—it’s sensitive, often mission-critical information that cybercriminals are increasingly targeting. Protecting it from ransomware, accidental loss, or corruption requires robust encryption, multiple backups, and careful management of access rights. Achieving this level of security without overcomplicating your architecture is a balancing act.
  • Integrating AI: AI-powered analytics can enhance video surveillance capabilities (e.g., object detection, behavior analysis), but integrating AI is not without its challenges, and it’s something you need to consider carefully.

Advantages of the cloud for video surveillance platforms

Picking the right cloud storage platform can help you meet these challenges. Cloud storage offers a host of advantages that can transform how video surveillance platforms manage and scale their operations. Here’s why it’s worth considering:

1. Scalability and flexibility

Need to add storage for hundreds—or thousands—of new cameras per month? On-site storage solutions, such as physical servers, hard drives, or NAS systems have fixed capacities. Cloud storage scales with your platform, whether you’re serving small businesses or sprawling enterprise deployments. And, it adapts whether your customers are integrating additional cameras or extending their retention periods.

2. Cost optimization

Traditional on-prem solutions demand upfront investments in hardware and then ongoing maintenance. Cloud storage eliminates the need for those upfront hardware purchases. And the right cloud storage provider with pricing models built for your use case can help you get an edge in a competitive market. 

The unique challenge of PUT requests

The video surveillance use case involves uploading a metric ton of data. In cloud terms, when you upload data, you typically use a PUT request. Many cloud providers charge for these API calls for hot and cold storage tiers. For example, it might be tempting to use a cold storage class to save on the base cost to store your video surveillance data, but a cold storage class like Amazon’s Glacier Instant Retrieval (GIR) charges $0.00002 per PUT request. When you’re making thousands or tens of thousands of PUT requests per day, even when the charges are tiny, costs can quickly escalate into the six figure range.

3. Accessibility

Whether your users are in the same city or halfway around the world, cloud storage makes video footage instantly accessible. Low-latency retrieval options and distributed cloud infrastructure mean faster access to the right data, regardless of location.

4. Reduced risk of data loss

Using the cloud for your data storage infrastructure mitigates the risks associated with physical vulnerabilities in the devices used by your customers. By automatically storing video footage in the cloud, even if the devices suffer damage, malfunction, or failure, the footage remains accessible thanks to the backup in the cloud. Replicate data across multiple geographically dispersed data centers, further reducing the risk of data loss due to hardware failures or natural disasters. 

5. Enhanced security

Modern cloud storage solutions come with enterprise-grade security baked in, from end-to-end encryption to role-based access controls to scalable application keys. With built-in redundancy and regular backups, the cloud offers peace of mind that on-prem systems can’t always match—no more worrying about a single failed drive wiping out critical footage.

6. Minimized maintenance and IT overhead

Maintaining on-premises NVR/DVR systems requires substantial hardware and IT resources. In contrast, cloud systems are easier to manage. Updates, patches, and maintenance are handled by the cloud service provider. Your IT teams can focus on strategic tasks rather than routine server upkeep. This shift reduces your operational costs and streamlines management.

7. Ease of integration with AI and analytics

Storing your footage in the cloud makes it easier to integrate advanced features like AI-powered analytics, motion detection, and real-time alerts. You have a centralized repository of data that’s easy to analyze at scale. And, as opposed to on-prem systems, cloud platforms offer instant access from anywhere, making it more viable to train AI models on your own data.

Backblaze for video surveillance storage

Unlike some cloud providers and classes, Backblaze makes it affordable to secure, grow, and use your surveillance data. Backblaze’s straightforward pricing with zero PUT fees means you don’t have to worry about fees adding up, allowing you to store, manage, and access your data anywhere, anytime. And with 3x free egress for all plus unlimited free egress via many compute and CDN partners, you can more easily integrate with AI tools, sending your footage where the right GPUs are available without breaking the bank.

Learn More ➔ 

The future of video surveillance is in the cloud

Beyond the obvious scalability and operational gains of cloud storage, its biggest advantage in the coming years will be its AI readiness. AI innovations are poised to revolutionize security and video surveillance systems. By leveraging the cloud for AI and analytics, video surveillance platforms can move beyond mere storage to deliver actionable insights and real-time intelligence. This not only enhances the value you provide to customers but also differentiates your platform in an increasingly competitive market.

Cloud storage isn’t a one-size-fits-all solution, but for video surveillance platforms looking to streamline operations and future-proof their offerings, it’s hard to ignore the shift towards the cloud. 

The post Video Surveillance Storage in the Cloud appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Disaster Recovery 101: Hot vs. Warm vs. Cold DR Sites

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/disaster-recovery-101-hot-vs-warm-vs-cold-dr-sites/

A decorative image showing a hot and a cold temperature gauge overlaying patterned images that show drives and data centers.

It goes without saying (but I will say it anyway) that having a disaster recovery (DR) site is essential to protecting business continuity (BC) in the face of disasters both big and small. However, even for large enterprises, building and maintaining a separate physical facility to store data copies can be cost prohibitive, and it may not make sense operationally. 

DR sites differ according to the availability of data for retrieval and by type of ownership (e.g., fully owned or colocated). In recent years, public cloud has also emerged as a viable DR “site”—meaning that backups, production data, and/or virtualized infrastructure can be effectively housed in the cloud. 

In this blog, I’ll examine the primary differences and pros and cons between various types of DR sites, and I’ll outline the most important criteria for deciding on the right DR setup for your business.

Proprietary ownership vs. colocation

If your business is able to fully invest in owning a DR site, the obvious upsides are greater control over security and infrastructure. But owning and operating your own site may still not be the most ideal option, given the staffing and expertise required. For many businesses, it doesn’t make sense to invest in owning and operating a data center when that’s not your area of expertise. 

That’s why many businesses opt for colocation. It can be a great option for adhering to your DR strategy and your expense limits. However, you must be careful to thoroughly vet the location and provider. Here are a few important points to consider:

  • Performance: You should understand what kind of equipment is used at the DR site, as well as what kind of durability and availability you can expect. Ensure that the available infrastructure can meet your required recovery time objectives (RTO) and recovery point objectives (RPO)—that is, the maximum amount of downtime your business can withstand and the maximum amount of data your organization can tolerate losing, respectively.
  • Security: A trustworthy provider should be staffed 24/7/365. Learn how the data center is protected. Are there cameras? Biometric security? How does the data center protect against things like fire and power loss?
  • Proximity: A data center that’s down the street from your primary location will offer no protection in the case of a regional disaster like wildfire or tornado—events that are unfortunately becoming more and more common. Ideally, you should choose a location that is far from your production facility. This is where the public cloud naturally fits in—but more on that in a bit.
  • Scalability: Gauge how much data you currently need to store as well as how much you expect to grow in the near future. Find out how much capacity the DR site can support and choose a site that can accommodate your planned growth.
  • Costs: Get a complete view of your total cost of ownership. This not only includes one time costs to get started and ongoing monthly or yearly expenses, but also potential costs for things like additional support or any capacity you may need to add in the middle of a contract period.
  • Compliance: Consider what compliance requirements your business must support. Some data centers are SOC 2 compliant; some are not. It’s also important to check your cyber insurance policy requirements. Many policies may require that you keep data backups in a facility that is far from your own. This is exactly the requirement that brought telco AcenTek to Backblaze.

Meeting cyber insurance requirements with the cloud

In order to satisfy cyber insurance policy requirements, AcenTek’s backups needed to be off-site and geographically distant from their own data centers. Backblaze offered a critical feature—immutability and certification as a Veeam Ready Object partner—as well as geographic distance from AcenTek’s own data centers to meet the requirements and protect AcenTek’s business.

Read the Full Story ➔ 

Hot, warm, and cold DR sites: Choosing the right strategy

Recovery sites are often referred to by temperature (hot, warm, cold) to describe the speed and importance of applications and data in those protected sites. The ideal DR site temperature depends on your organization’s budget, risk tolerance, and RTOs. Businesses with critical systems requiring near-instantaneous recovery might opt for a hot site. Others might find a warm site or even a cold site a more cost-effective option for less time-sensitive systems.

Hot, warm, and cold: Choosing the right DR site temperature

Hot site Warm site Cold site
Description A fully functional replica of your primary production resources, constantly maintained and ready for immediate failover in the cloud or to a secondary on-premises site. A pre-configured cloud recovery site or hybrid recovery with hardware and software infrastructure. Requires some manual intervention (e.g., software installation) before becoming operational. A basic physical facility with essential infrastructure (power, cooling, and network connectivity) requiring significant configuration and installation before use. May also include cold cloud storage.
Pros Fastest recovery times due to the site’s constant readiness. A balance between cost and recovery time. Faster than cold sites, but slower than hot sites. Most cost-effective option, requiring minimal ongoing maintenance.
Cons This is the most expensive option due to the need for complete infrastructure replication. Still requires some manual setup, potentially delaying recovery time. Longest recovery times due to the extensive configuration and installation needed. Or, in the case of cold cloud storage—the time required to retrieve your data.
Example RTO goal times RTO <15 minutes RTO <24 hours RTO >24 hours

Public cloud as virtual DR site

Traditionally, DR for large enterprises would involve building a physical site to support RTO objectives. It’s important to note that building or buying a dedicated DR site might not be the most cost-effective option for all backups. Instead, cloud storage offers a compelling solution specifically for backups, even if you have your own physical DR site.

Why Backblaze works for DR

Cloud storage from a specialized provider like Backblaze is generally more affordable and scalable than on-premises storage solutions or off-site DR facilities, making it a great fit for this purpose. Backblaze offers always hot storage with 3x free egress, meaning data can be immediately recovered when needed without surprise egress bills. In this way, Backblaze B2 Cloud Storage constitutes a virtualized hot DR site.

Learn More ➔ 

Cold cloud storage considerations

While some consider cold cloud storage to be the most cost-effective solution, the cost savings of cold storage are often entirely negated by its long retrieval time and egress charges—so much so that it no longer becomes a viable disaster recovery option.

Evaluating cloud storage providers

In a way, you can consider the public cloud very similarly to a colocated DR site. All the same questions apply when choosing between cloud storage providers (CSPs):

  • Performance: What durability, reliability, and availability does the CSP offer? What kind of throughput do you get on a proof of concept?
  • Security: Does the CSP staff their data centers 24/7/365? What security processes and procedures are in place?
  • Proximity: Where are the CSP’s data centers located? Choose one that offers good geographic separation from your production facility while ensuring you can still meet your RTO with latency considered.
  • Scalability: Cloud storage naturally offers infinite scalability, but it’s vitally important to ask your CSP how they handle things like capacity overages or the need to purchase additional capacity. Some CSPs will charge you excessive fees when you go over capacity, or they may require you to switch to a different pricing model if you need additional storage space in the middle of a contract period.
  • Costs: Again, you need a complete view of your TCO. Watch out for things like minimum retention periods, egress charges, and other hidden fees.
  • Compliance: Be careful of CSPs that claim they’re SOC 2 compliant. Sometimes the CSP operates in SOC 2 compliant data centers but the company is not SOC 2 compliant itself. That difference may be meaningful to your company or your own compliance requirements.

Ultimately, you must carefully balance business requirements for RTO and RPO with DR investment costs. Businesses located in likely disaster areas like tornado alley, earthquake-prone zones, or coastal areas are well served by the additional investment in DR infrastructure. But even if your company has its own DR site, public cloud can be a beneficial supplement to your own DR infrastructure.

The post Disaster Recovery 101: Hot vs. Warm vs. Cold DR Sites appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI for Enterprise: Getting Started

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-for-enterprise-getting-started/

A decorative image showing various cloud storage and business related icons.

AI is here to stay, and the question on everyone’s mind is how to implement it successfully. If you’re ready to implement AI in your business, consider this article a good jumping off point. I’ll talk about different options for integrating it into your operations and how to make it truly custom, based on your own data, and useful for your business.

More from AI 101

Want to read more about AI? We’ve got you covered in our AI 101 series. And, here’s a sampling that might be useful when you’re thinking about building AI into your business.

How many companies use AI today?

How many businesses are using AI, you ask? Well, let’s ask Google. According to their AI overview (yes, we appreciate the irony), anywhere between 55% and 83% of companies are using or exploring AI in some way.

A screenshot of the Google AI overview that results from the query "how many businesses use AI"?

It’s not lost on me that the above results illustrate some of the big limitations of AI—namely that it’s only as good as the data it’s trained on, it’s far from infallible, and it can’t replace humans wholesale especially when someone needs to fact check those results. Google’s AI overviews have been criticized for providing inaccurate information, hallucinating (with sometimes hilarious results), providing a neat answer to complicated questions, providing information from unreliable sources, potential for bias, and so on. Nevertheless, the feature has had several updates since it was first released (which at least means it’s no longer telling us to put glue on pizza).  

But, setting all that aside, this is actually a great example to consider before we dig into options for incorporating AI into your business. AI Overviews have improved enough—for example, by adding things like source transparency—that we can easily add enough human oversight to consider the above directionally accurate. The landscape of technology is changing, and, ready or not, businesses are being forced to figure out how AI should fit into their strategies.  

What we’ll talk about today

Today we’ll talk about some foundational topics you need to understand when deciding how to incorporate AI into your business. We’ll define the following: 

  1. Software as a service (SaaS) AI add-ons
  2. AI as a service (AIaaS)
  3. Foundation models
  4. Retrieval augmented generation (RAG)

Those definitions will lead us quickly to some practical examples that illustrate how businesses are using AI. 

Software as a service (SaaS) applications, aka, AI as a feature

You may have noticed that many of the web-based applications you are using are suddenly AI-powered or have AI capabilities. While some of that is marketing hype, this could be a way to get started with AI in your organization—by simply turning on a feature in a SaaS product you’re already using. There are lots of ways to do this—Slack, for example, offers AI tools for summarizing and answering questions to help teams work faster.

Example AI use case: AI in customer support

Generative AI capabilities such as chatbots are often added to customer-facing applications like your customer support service. The chatbot is trained using your product support materials or actual questions your staff previously answered.  

By providing a cache of human-based questions and answers, the chatbot can be trained to respond in your unique company voice.

A screenshot of the Backblaze chatbot live on www.backblaze.com.
Oh hey, there’s ours!

Before you activate and use a built-in AI feature of an existing service, you’ll want to determine how you can measure any changes in overall productivity and user satisfaction. In the customer service example above, that could be capturing metrics such as a customer satisfaction rating, time to first contact, time-to-resolution, escalation ratio, and so on. Then establish a baseline for the existing system before engaging the AI assistant and set specific points where you will compare that baseline to the AI powered system. 

Using an AI powered service has many benefits, but there are a number of considerations to contemplate:

  • You are limited in functionality by what the vendor provides.
  • What is the expertise of the software vendor in developing, training, and implementing an AI model?
  • What happens when the model data changes? For example, you’ve employed AI to respond to customer queries. What happens when you add a new product to your lineup or a new feature to an existing product? Is the model retrained? What are the costs? Does it still make economic sense given any new cost? 
  • During the model creation and operational phases, ancillary files such as checkpoints, prompts, responses, and so on are created. Do you have visibility into these files and what analysis can you perform? 
  • Given these ancillary files are derived in part from your original data, can you download these files to your central repository or is the data locked in the vendor’s application?

Artificial intelligence as a service (AIaaS)

AIaaS is one of the many areas of AI where definitions and capabilities are a moving target. That said, we’ll offer that AIaaS is an outsourced service that a cloud-based company provides to other organizations that gives that organization access to different AI models, algorithms, and other resources directly through the vendor’s cloud computing platform via a user interface (UI), API, or SDK connection. The aim is to make a user-friendly interface that simplifies the process of training and deploying AI models accessible to non-AI experts.

AIaaS is worth considering if you’re interested in working with artificial intelligence but you don’t have the in-house resources or expertise to build and manage your own AI technology. There are a broad range of solutions offered in this space which vary by the services provided, let’s categorize the services as follows.

  • Walled gardens:
    • What they offer: In my experience, AIaaS providers in this group usually host most or all of the model training data, checkpoints, inferences, and prompts. 
    • Pros and cons: This is the most straight-forward option, but in practice, this method can be cost prohibitive and lacks transparency. There are few if any options to reduce the cost or economically transfer the model, its work products, or its data elsewhere.
    • Who are they: The obvious ones that come to mind for me are companies like AWS, Google, and IBM Watson.
  • Mix-and-match:
    • What they offer: Solutions in this group vary by the services they provide as well as add-on options and support services. They typically provide hosting services which are used to train, deploy, and use the model. They can also provide data analysis and cleansing for the model input, model testing, engineering support, and general support services as you might require.
    • Pros and cons: As with the walled garden approach, once data is ingested or ancillary data is created within the system it may be difficult to access and if available expensive to retrieve. Often, they also represent companies that provide specialized services—for instance, companies that solve a type of problem, like a computer vision specialist vs. a natural language processing model, or, alternatively, a company that focuses on AI in IT operations, call center operations, cybersecurity, etc.  
    • Who are they: This group includes companies like Twelve Labs, Proofpoint, or Amplify. Note that there’s a bit of a porous line between some of the providers in this category and the following—think of it like a gradient. 
  • Open cloud:
    • What they offer: Providers in this group offer a variety of tools and services that, when combined, allow an organization to construct, test, operate, and maintain an AI-based solution. 
    • Pros and cons: The open cloud approach allows you to select the best of breed providers for the various stages of your AI project. It also allows you to have control over the model and its byproducts such as checkpoint data, inferences, and prompts key to ensuring the model is performing as expected. In summary, while your level of effort for this approach will be higher, you will have more control over your model and more importantly the data, your data.
    • Who are they: This includes platforms like Hugging Face and vendors like OpenAI of ChatGPT fame. Hugging Face is intentionally open source, whereas OpenAI is under pressure to monetize models—one of the bigger evolving conversations in the AI landscape. Today, anyone can purchase an API access subscription from OpenAI to access the GPT-4 Chat from their application. Such subscriptions offer quick access to organizations that want a mature model but aren’t able to or interested in building one themselves. 

The AIaaS approach is a good choice for organizations that lack expertise in building and operating AI systems. The approach you take, walled garden, mix-and-match, or open cloud, will affect how much access and flexibility you have with the data used and produced by the system. This may not be of interest today, but as your organization becomes more AI savvy, being able to access and share the data within the system could become important.

Foundation models

The term “foundation model” originated with the Stanford Institute for Human-Centered Artificial Intelligence’s (HAI) Center for Research on Foundation Models (CRFM) which defines it as “any model that is trained on broad data that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks.” Most, but not all, foundation models are generative AI in form and perform tasks such as language processing, visual comprehension, code generation, and human-centered engagement.

Although foundation models are pre-trained, they can continue to learn from prompts during inference. An organization can develop tailored outputs using techniques such as prompt engineering, fine-tuning, and pipeline engineering. For example, prompt engineering requires you to enter a series of carefully curated prompts to the model such that over time the model infers more precise answers related to the subject matter of the prompts. This makes the model less generic and more specific to your organization.

When using a foundation model, you will need to capture and store all data used to fine-tune the model, for example the prompts and responses used for the prompt engineering process. This will allow you to analyze how the inference process is shifting over time. 

Utilizing a foundation model as a starting point is a good choice, but techniques such as prompt engineering are far from being an exact science. Often such training can exacerbate a subtle bias in the existing model or introduce a new bias. This is especially true if the model is public facing.

Retrieval augmented generation (RAG)

Retrieval augmented generation (RAG) is a relatively new technique that allows AI models to link to external sources. These models are, in most cases, a generative AI model, such as a large language model (LLM). By using RAG techniques, external resources, often rich in technical content, can be leveraged as part of the model during inference to be part of the response to the user. One commonly cited example is having medical journals indexed via this technique so their content is reviewed when the model is generating a response. The same could be done with financial data, legal case law, and so on. 

RAG works by adding code to the original generative AI model to continuously review defined external resources and convert them into machine-readable indices (vector databases) so they are available for inference. This means the core generative model does not have to be retrained, instead it can use new or updated sources on the fly. This allows you to use your data to make the model your own and lets you update the data sources to keep the model current.

This technique is extremely powerful, but it does require you to store the original model, the testing or validation data used, the external resources you are using to augment the model, their vector databases, and any prompts and inferred responses. Given the tools and utilities you will use to monitor and analyze how your RAG infused AI model is performing, a central cloud storage repository is a good choice for storing this data.

It’s all about the data—Your data

AI, at least in its current form, is not deus ex machina. Yes, ChatGPT and its ilk can create wonderful stories of fact or fiction and amazing, never before seen imagery, but without your data, they are marvelously generic. In other words, you and more precisely your data are the key to the value your organization will achieve in using AI.

As we have seen, there are a multitude of options. On one hand, we can hand off our data to a company, pay them handsomely, and let them build and run our AI models—the walled garden approach. While this is enticing, the reality is that AI is still a moving target with few rules and regulations in place and your visibility to what is happening to your data is limited as is your ability to do something if there is a problem. 

At the other end is the open cloud approach. This allows you to choose the best-of-breed cloud based applications and cloud compute services to create and run your model. These applications and services can interact freely with your cloud storage platform to leverage your organization’s data while providing you complete visibility and control. Yes, it will require more investment on your part, but given the maturity of AI in general, it makes sense for you to keep a watchful eye on how AI is used in your organization and more importantly how well it is performing.

In short, AI requires your data to be truly useful to your organization. AI in its current form is still a young science, one that requires watching to ensure it does what is expected. That’s not paranoia, that’s just good business. To do this you will need unfettered affordable access to your data, the AI model, and its work products.

The post AI for Enterprise: Getting Started appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Adds Canadian Region, Expanding Location Choices and Data Sovereignty Options

Post Syndicated from Chris Opat original https://www.backblaze.com/blog/backblaze-adds-canadian-region-expanding-location-choices-and-data-sovereignty-options/

A decorative image showing Backblaze data regions, represented as clouds, around the planet.

Customers that have data governance, compliance, and performance at top of mind have more options for achieving all three with the opening of our new data region, known as Canada East (or CA East). The region is now available for current and future Backblaze customers.  

This new region builds on our mission to deliver high-performance, compliant, and cost-effective cloud storage solutions to businesses around the world and further expands our footprint in the North American market.

Meeting the needs of Canadian businesses

Our new CA East region is located in Toronto, Ontario, and has been designed to address the specific needs of Canadian businesses and organizations, many of which are subject to laws and regulations requiring data to be stored within the country. With this expansion, customers are able to ensure compliance with local regulations while taking advantage of a robust cloud solution that prioritizes data sovereignty. 

A local region also delivers performance benefits for Canadian customers. By reducing the distance that data needs to travel, Backblaze can offer lower latency and improved speeds for Canadian customers, making it ideal for real-time applications and large-scale data transfers.

Strengthening our partnership with Opti9

In collaboration with Opti9, an international leader in hybrid cloud solutions and a Veeam Cloud Storage Provider (VCSP), this expansion marks a significant opportunity for us to deliver robust managed services to Canadian businesses. Opti9, as the exclusive Canadian channel partner for Backblaze B2 Reserve and the Powered by Backblaze program, is uniquely positioned to bring this enhanced offering to market.

Opti9 and Backblaze share a unified vision of providing Canadian businesses and organizations with cutting-edge cloud solutions that are both affordable and high performing. Cloud data storage is evolving rapidly to meet changing customer needs. We are excited to launch this Canadian storage region in collaboration with Backblaze, which expands our overall cloud storage footprint in Canada. This partnership equips our Canadian partners and end-user organizations with the tools they need to thrive in today’s fast-evolving digital landscape.

—Cory Mac Donell, Vice President of Sales & Business Development, Opti9

Protecting data within borders

Canada’s cloud services market is expanding rapidly, driven by increased demand from industries such as healthcare, finance, and government—all of which often require data to remain within national borders. The new data region gives Canadian and international businesses more choice for storing their data while maintaining data sovereignty.

Competitive edge through open cloud solutions

Multi-cloud and hybrid cloud strategies are becoming all the more common. Businesses increasingly seek open, interoperable solutions that avoid vendor lock-in and allow them to integrate the best services from multiple providers and our offerings provide the flexibility and control businesses need, while still benefiting from the security, compliance, data governance requirements, and performance of a local data center. The new region enables companies doing business in Canada to tap into multi-cloud and hybrid cloud strategies as they look to strengthen their cloud infrastructure.

Security and compliance details for the Canadian region

The Toronto data center has been assessed and maintains a security program that addresses the requirements of SOC 1 Type 2, SOC 2 Type 2, ISO 27001, PCI DSS, and HIPAA. These certifications ensure the highest levels of security and compliance for businesses in regulated industries. 

Ready to store data in CA East?

The new data region is available to customers now, and you can create an account there by selecting CA East in the region drop-down when creating a Backblaze account. Already storying data with Backblaze and want to keep a Canadian copy? Leverage our Cloud Replication feature and diversify your storage. 

We’ll have more stories to tell about bringing up the data center and some of the interesting networking there, so stay tuned to the blog!

The post Backblaze Adds Canadian Region, Expanding Location Choices and Data Sovereignty Options appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Bookblaze: The Third Annual Backblaze Book Guide

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/bookblaze-the-third-annual-backblaze-book-guide/

A decorative image showing a book and a cozy library.

It’s time once again for our annual book guide, where Backblaze authors give you the inside scoop on what they’ve been reading. So, whether the weather outside is frightful, or, like at our home office in San Mateo, weird and drizzly, we hope you enjoy!

Pat Patterson, Chief Technical Evangelist

The cover image of Never Understood.

Never Understood: The Jesus and Mary Chain, by William Reid and Jim Reid

I love a good book about music, and when I saw autographed copies of “Never Understood” on sale at the merchandise stand at the Jesus and Mary Chain’s San Francisco gig earlier this year, I could not walk away without buying one. The book is co-authored by William and Jim Reid, the Scottish brothers who have been the only consistent band members since they started making music in the early ‘80s, and alternates between their accounts from early life in a Glasgow tenement through growing up listening to the Velvet Underground, Iggy Pop, and Bowie in the nearby post-war new town of East Kilbride, to realizing that the band each of them wanted to form on their own was actually the same band, and the subsequent rollercoaster ride of recording, touring, breaking up, and getting back together.

There’s a lot of humor amongst the rock and roll excess—one of my favorite moments was the contrasting explanations of how they assigned roles as they were getting started. From William: “It wasn’t like it was Jim’s dream to be the singer—we basically had a big fight about who was gonna sing and he lost.” Jim writes: “We actually tossed a coin for it, but the outcome was the same: William won. I was the singer.” Comedy soon turns to tragedy, however, as Jim explains how he turned to heavy drinking to overcome his shyness of singing on stage, setting the scene for a lifelong battle with alcohol.

Lee Brackstone, the book’s editor, deserves credit for the excellent job he’s done stitching this all together. Even though the viewpoint bounces between the two brothers, it reads as a single narrative. William’s passages are set in a serif font, while Jim’s are sans, so you quickly develop a feel for who you’re reading. It’s a riveting tale, whether you love or hate the band’s music—I envy you listening to their debut album Psychocandy for the first time if you don’t fall into either of those camps—and the brothers’ love/hate relationship brings a poignant dimension to what is already a classic story of early success, record label indifference and shenanigans, figuring out how to play the music you hear in your head, and being shocked that other people actually want to hear it too.

Yev Pusin, Sr. Director, Marketing

The cover image of Impact Winter by Travis Beacham.

Impact Winter, by Travis Beacham

A comet strikes the earth and blocks out the sun. Bad news for people, good news for vampires. If you like the concept of 30 Days of Night and enjoy great world building and story telling with a bloody twist, this is a fantastic addition to your schedule. Bonus: It’s an audio drama, so perfect for your commute.

Jeremy Milk, Sr. Director, Product Marketing

The cover image of How Big Things Get Done by Dan Gardner and Bent Flyvbjerg.

How Big Things Get Done, by Dan Gardner and Bent Flyvbjerg

I stumbled upon this book right around the time one big thing in my life was proceeding nicely and another was not. Why? This book didn’t give me all the answers—sorry, there are no silver bullets—yet it provided a digestible, pragmatic framework for successfully managing big projects and initiatives, with situational awareness for the psychology of the many stakeholders who will be key to the success. As an impatient person who also likes to plan, I took away new nuance from the authors’ Think Slow, Act Fast model. And, as a student of Eric Ries’ The Lean Startup model, I appreciate the authors of this book adding their own flavor of MVP with the Maximum Virtual Product concept when you simply cannot lean-test something as big as you envision and yet you can develop virtual proxies to test underlying assumptions and elements. Now I’m ready to tackle far more big things.

Nicole Gale, Marketing Operations Manager

The cover image for The Women by Kristin Hannah.

The Women, by Kristin Hannah

I love historical fiction and The Women is the first book I’ve read about the Vietnam War. As a big Kristin Hannah fan, I love how she weaves different stories about the historical event into her own. We were immersed into the world of how women were treated in the Vietnam War and I’ll never forget their stories. This one is a must read!

David Johnson, Product Marketing Manager

The cover image for the book The Coming Wave by Mustafa Suleyman.

The Coming Wave: Technology, Power, and the Twenty-First Century’s Greatest Dilemma, by Mustafa Suleyman

I’d suggest “The Coming Wave” by Mustafa Suleyman. It offers an insightful perspective on the evolving world of artificial intelligence and its impact on society. It’s about a year old now, but still great in my opinion.

Bala Krishna Gangisetty, Sr. Product Manager

The cover image for Mindset by Carol Dweck.

Mindset: The New Psychology of Success, by Carol Dweck

This book changed how I see things and perceive challenges or setbacks fundamentally. Growing up, I was wired to strive for perfection in everything I did, and this book shifted my focus from being perfect to continuous improvement. It helped me see opportunities for learning and growth when things don’t go as planned. The best part is that the ideas in this book work for all parts of life, not just work.

Teresa Dodson, Sr. Director, Partner Marketing and Alliances

The cover image for Dare to Lead by Brene Brown.

Dare to Lead: Brave Work. Tough Conversations. Whole Hearts., by Brené Brown

From the official summary: Leadership is not about titles, status, and wielding power. A leader is anyone who takes responsibility for recognizing the potential in people and ideas, and has the courage to develop that potential. Check it out!

Stephanie Doyle, Writer and Content Operations Strategist

The cover image by Skyward by Brandon Sanderson.

The Skyward Trilogy, by Brandon Sanderson

I suppose it’s cheating a bit to recommend a whole series, but the story arc in this series by fantasy heavyweight Brandon Sanderson is great! Full disclosure: I’m hit or miss on Brandon Sanderson’s wider works. (I hate Mistborn and love The Way of Kings. Feel free to get mad at me in the comments.) That said, this series starts with a plucky young heroine on a dystopian planet (don’t worry folks: no love triangle in this one—if you know, you know) and extends into a fascinating view of space travel, AI, and what it means to have a soul.

Happy Reading from Backblaze

We hope this list piques your interest—we may be a tech company, but nothing beats a good, old fashioned book (or audiobook) to help you unwind, disconnect, and lose yourself in someone else’s story for a while.

Any reading recommendations to give us? Let us know in the comments.

The post Bookblaze: The Third Annual Backblaze Book Guide appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

5 Ways Event Notifications Strengthens Your Backup Strategy Automatically

Post Syndicated from David Johnson original https://www.backblaze.com/blog/5-ways-event-notifications-strengthens-your-backup-strategy-automatically/

A decorative image showing a cloud with diagrammed icons around it.

“Our backups are good, right?”

If you’re responsible for backup operations, you’ve probably heard this question more times than you can count. While the answer should be a simple “yes,” staying on top of backup activities often involves checking multiple systems, reviewing logs, and maintaining manual tracking processes.

Today, I’m sharing five ways you can implement Backblaze Event Notifications into your data protection strategy to keep you and your team informed. If you’re interested in Event Notifications for other use cases, check out our posts for media production and application workflows.

Event Notifications for IT backup: Simplified automation

Event Notifications monitors your B2 Cloud Storage buckets for data changes that you designate—like completed backups, file deletions, or policy violations—and delivers real-time alerts where you want them. These alerts can trigger automated actions in any system that accepts webhooks, from PagerDuty to Zendesk to Slack channels and more.

Think of it as your storage system’s notification service: instead of discovering changes during routine recovery verification checks, you get instant awareness when something happens to the data in your buckets.

What are webhooks?

Webhooks, if you’re not familiar, are a way for applications to communicate with each other by sending data automatically based on specific events, e.g., HTTP POST requests with a JSON payload. What sets Backblaze Event Notifications apart is that it works with any service that accepts webhooks. This means you can integrate backup monitoring into your existing tools and processes, rather than being locked into specific vendors’ ecosystems.

5 ways to stay in the know with your backup strategy

Here are specific, practical ways you can take advantage of Event Notifications for immediate benefits to your backup and archive workflows.

1. Backup verification and reporting

When your backup software writes files to B2 Cloud Storage, Event Notifications helps verify successful completion of backup jobs. Each time a backup file lands in a bucket, you’ll receive a notification with key details like file size, timestamp, and backup job name. By feeding this data directly into communication tools like Slack, you can maintain comprehensive logs of backup activity without manual checks.

Backup monitoring workflow

Gone are the days of discovering backup issues hours or days later during routine reviews—you’ll know exactly when backups are uploaded. Teams can configure custom alerts for backup size thresholds, receive immediate confirmation of successful backups, and, with the help of Zapier, you can enable an alert when Event Notifications did not trigger, indicating a backup was not uploaded during a specified window.

2. Security and compliance monitoring

Event Notifications can help protect your backup data from unauthorized changes. Security teams can establish automated alerts for suspicious activities like mass deletions or modifications. These alerts integrate with your existing security information and event management (SIEM) systems to provide unified threat monitoring.

Security alert workflow

Beyond threat detection, Event Notifications enables preemptive policy enforcement. Teams can configure automatic notifications that guide employees when their actions might conflict with backup policies—like modifying file names, moving files, or even deletion. For persistent policy conflicts, managers can receive automated escalation alerts to address potential training needs or process gaps. This systematic approach helps maintain backup integrity through education and awareness before issues occur, rather than just detecting violations after the fact.

3. Storage management automation

Storage management becomes more efficient when Event Notifications feeds activity data directly to your management tools. As files are uploaded to and removed from your buckets over time, Event Notifications provides valuable data that helps you analyze storage utilization trends and backup data growth patterns.

Data usage monitoring workflow

This constant flow of information empowers teams to anticipate capacity needs and optimize resource allocation. Moving from reactive to proactive storage management helps control costs by notifying you when backups become larger on average.

4. Cross-bucket backup monitoring

Organizations using Cloud Replication or managing backups across multiple buckets gain valuable oversight through Event Notifications. This capability tracks file replication between regions and monitors backup activity across your entire footprint, giving you a comprehensive view of your distributed backup strategy. Teams can spot replication delays or issues immediately, rather than waiting for scheduled status checks.

Cloud Replication notification workflow

Understanding how data moves and grows across different locations ensures your distributed backup strategy performs as designed. Event Notifications makes it possible to track successful replications, monitor consistency between primary and replica buckets, and receive immediate alerts about any issues. This visibility is especially valuable for organizations maintaining geographic redundancy or managing complex multi-site backup strategies.

5. Integration with IT workflows

Event Notifications connects seamlessly with existing IT tools and processes through standard webhooks. Backup events can automatically flow into ticketing systems like Jira Service Management, monitoring dashboards like Grafana, or team communication channels like Microsoft Teams and Mattermost. This integration means teams can manage backup operations through familiar tools and processes, without needing to constantly switch between different interfaces or learn new systems.

Data integration workflow

The result is streamlined operations without the need for separate backup monitoring systems, ensuring backup activities receive proper attention within normal IT procedures. Teams can create ServiceNow tickets for failed backups, update Jira boards with backup status, or send notifications to Teams channels—all automatically and in real-time.

Why Event Notifications makes sense for backup teams

Managing backup operations has traditionally meant juggling multiple monitoring tools and hoping you catch issues before they impact recovery capabilities. Event Notifications transforms this approach by providing:

  • Automated awareness: Replace manual checks with instant visibility into bucket changes.
  • Enhanced security: Track backup data access and modifications as they happen.
  • Simplified monitoring: Feed backup activity data directly to your management tools.
  • Better operations: Free up time to focus on improving backup strategies rather than monitoring them.
  • Flexible integration: Adapt backup monitoring to fit your existing processes, not the other way around.

How it works with your environment

Unlike traditional backup monitoring solutions that often require specific software for notification handling, Event Notifications works with any service that accepts webhooks. This fundamental difference means you aren’t locked into specific vendors’ ecosystems or forced to use particular monitoring tools.

Event Notifications is designed for reliability with at-least-once delivery, ensuring critical backup events are never missed. This reliability is especially important for teams building automated workflows that require consistency and transparency in their backup monitoring.

The pricing model is straightforward and predictable: Backblaze B2 Reserve customers receive unlimited notifications at no additional cost, while pay-as-you-go customers get 2,500 notifications free each day and pay just $0.004 per 10,000 additional calls. This transparent pricing applies regardless of which services you’re connecting to, enabling teams to build comprehensive backup monitoring without worrying about unpredictable costs.

Ready to automate your backup monitoring?

If you’re working with a Backblaze account manager, Event Notifications are already enabled—just ask them for setup guidance. Other existing customers can contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn how Event Notifications can strengthen your backup operations.
Once enabled, visit the Event Notifications section in your B2 Cloud Storage buckets to configure your alerts. For detailed setup instructions and best practices, check out our Event Notifications documentation.

The post 5 Ways Event Notifications Strengthens Your Backup Strategy Automatically appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Disaster Recovery 101: Navigating Backup and Archive Infrastructure

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/disaster-recovery-101-navigating-backup-and-archive-infrastructure/

An illustration of a city scape with lines travelling up to a cloud representing digital transmission.

Aging infrastructure, strained budgets, and exponential data growth create unique challenges for disaster recovery (DR) planning. When assessing your backup and archive infrastructure, you’re probably balancing data governance, data sovereignty requirements, compliance requirements, and the needs of your end users. Many legacy data storage systems can create gaps in an otherwise airtight DR plan. 

Today, I’m talking through how to approach infrastructure decisions for your cyber resilience posture. You have a lot of options. On-premises? Cloud services? Hot? Warm? Cold? What combination works best for your needs? Understanding the nuances can help you sharpen your strategy.

Disaster recovery challenges

1. Relying on on-premises backup and archive infrastructure

Traditionally, businesses have relied heavily on on-premises backup solutions. Robust storage systems hold critical data, often backed up to secondary storage within the same physical location. While this approach offers a sense of control, it also presents vulnerabilities. 

On-premises backups are at risk of localized events like loss of power, fire, flooding, or other natural disasters. A geographically separate DR site or other far off-site backup is essential for complete protection and compliance. Without this, the organization risks losing critical data in cases of a regional outage or loss of access. 

The shift to public cloud and SaaS options opened the door to more secure and reliable data backup and disaster recovery solutions. By utilizing cloud-based storage and backup services, organizations can ensure that their data is protected in multiple locations, reducing the risk of data loss due to localized disasters. Additionally, cloud-based solutions offer scalability and flexibility, allowing organizations to easily expand their storage capacity as needed.

2. Falling into the replication trap

Many businesses have established alternate data centers as a secondary backup layer. However, these sites frequently only use replication technology. This situation can result in a scenario known as the “replication trap.” There is a risk that data compromised by malware is replicated to the DR site, leading to potential data loss. 

Off-site, immutable backups, independent of the primary site’s data, are a key component of a robust DR strategy. In cases of malware attacks or accidental data deletion by users, off-site immutable backups allow for data retrieval from a backup saved prior to the incident and reduce possible interruptions. 

3. Underestimating LTO limitations

Despite being viewed as a legacy technology, tape backups continue to be used in many organizations due to their reliability and cost-effectiveness. It is common to store tapes in a separate location to diversify data storage geographically, which helps reduce the impact of local disasters on data access and enhances overall data resilience. 

Off-site tape backups may increase recoverability but create challenges with recovery time objectives (RTO) because of the increased time it takes to retrieve data from a separate location and restore it using tape technology. Hardware issues can happen often and unexpectedly. Cloud-based data storage and archiving has gained popularity because of higher availability and cost savings over traditional tape backups. 

The cost and time required to operate multiple data centers and meet recovery times should also be considered in the requirements for your production and DR infrastructure. Never underestimate the risk to a successful recovery when facing time-consuming tasks like physical site recovery and data restoration from tape.

4. Leaving cloud-based productivity tools vulnerable

Cloud-based collaboration and communication tools like Google Drive and Microsoft 365 are commonly used by businesses and yet are often left vulnerable to data loss. Cloud services do not provide sufficient protection and recovery options that organizations likely need. 

Businesses often find that the responsibility for backing up this data falls on their own IT, as these services typically operate under a shared responsibility model that doesn’t offer comprehensive backup solutions. 

To ensure a reliable DR plan that includes cloud services, you should: 

  • Evaluate granular recovery requirements for productivity platforms like Google Workspace and Microsoft 365. 
  • Evaluate adherence to your long-term backup retention policy keeping in mind the regulations that your business might be subject to. 
  • Determine if data stored in cloud platforms needs to be backed up with immutability due to cyber insurance requirements or other security policies. 
  • Examine best practices for comprehensive, secure data protection for shared cloud drive services and SaaS productivity tools to address the lack of built-in recovery features.
  • Plan to store true backups of your SaaS data just as you would for any other data. It may seem redundant to back up cloud platforms to the public cloud, but doing so ensures that you have the right point-in-time backups you need and you can recover on your timeline—not Google or Microsoft’s. 

Cloud costs will need to factor into decisions for where to store your data. Cloud storage costs should be included as a non-functional requirement to make sure you can achieve your secure recovery goals without sacrificing affordability.

Best practices for cloud-based disaster recovery

Many enterprises rely on cloud-based DR solutions to ensure uninterrupted operations, protect critical data, and maintain customer trust. Unlike traditional DR methods, cloud-based solutions offer scalability, cost-effectiveness, and rapid recovery capabilities. To truly leverage the potential of these systems, it’s important to be aware of some key strategies and considerations to optimize your cloud-based disaster recovery plan, ensuring resilience in the face of unexpected disruptions.

  • Consider diversifying your cloud portfolio: Using the same cloud service provider for your backups as for your production data may not be necessary, as you don’t need the same level of performance for backup data. You could consider a tiered recovery approach based on the criticality of your applications and data.
  • Investigate existing tools for cloud compatibility: Many on-premises data protection tools like Synology or QNAP NAS devices also support cloud targets for backup storage. It’s important to match the capabilities of your current backup vendors to your recovery requirements and cloud storage budgets. 
  • Avoid paying for storage you’re not using: Carefully read the fine print when considering cloud storage costs. Hidden fees, minimum retention requirements, and complicated pricing tiers make accurate forecasting difficult and could leave you paying for unused storage just to reach certain discount tiers. 
  • Balance your budget with RTO and RPO targets: Using cloud data storage for production, backups, and archive can lead to some price shock as your environment scales. And moving data to lower cost storage tiers or cold storage may achieve attractive price reductions, but it often comes at the cost of recovery speed and added complexity. Look for a cloud storage provider with transparent pricing that makes it easier to plan your costs.

Finally, you should weigh your cloud-based options to evaluate platform compatibility, ongoing costs, and whether your CSP locks you in or out of specific ecosystems due to high storage costs, data transfer costs, and proprietary features. 

Leveraging cloud-based backup and archive infrastructure

Adopting cloud-based disaster recovery best practices is a key consideration for building a resilient and reliable business infrastructure. By developing a well-structured disaster recovery plan, determining the right mix of storage solutions, and optimizing costs with tiered recovery, businesses can minimize downtime and data loss during unexpected events. A proactive approach not only safeguards your organization’s operations but also strengthens customer trust and competitive advantage. In a world where disruptions are inevitable, being prepared is the key to bouncing back stronger and faster.

The post Disaster Recovery 101: Navigating Backup and Archive Infrastructure appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/effortlessly-managing-unfinished-large-file-uploads-with-b2-cloud-storage/

An illustration of a cloud with boxes representing data uploading to the cloud.

Digital clutter isn’t just inefficient, it can be costly. And if cleaning up digital clutter in your business operations is one of your New Year’s resolutions for 2025, this post is for you. We’re talking about managing unfinished large file uploads.

One big culprit of digital clutter when it comes to cloud storage is unfinished large files. Managing unfinished large file uploads can be a complex task. If they are not managed well, they can consume space and incur costs without any benefit. 

To address this, we’ve introduced a feature in Backblaze B2 Cloud Storage that automatically cancels unfinished large file uploads, saving you both time and money. 

The challenge: Unfinished large file uploads

To upload a large file, you break it into smaller parts. You initiate the start notification. Each part is uploaded in parallel, and once all parts are received, a finish notification is sent. Only after the final step does the file become consumable. Sometimes, things don’t go as planned—network hiccups, API timeouts, or user interruptions can leave large file uploads unfinished. The process then likely restarts and completes successfully, but this leaves you with both a complete file and a partially completed file in your cloud storage instance. These unfinished uploads still take up storage space, leading to unnecessary costs.

Previously, users had to manually track down and delete these unfinished uploads. It’s error prone and time-consuming, and not an easy task especially with a large volume of files. 

The solution: Canceling unfinished uploads through lifecycle rules

To streamline the process, we’ve added a feature that allows users to automatically cancel these incomplete uploads after a set number of days. By setting lifecycle rules through the B2 Native API, users can now specify how many days an unfinished large file can remain before it’s automatically deleted. 

For detailed guidance on configuring this rule, check out our Lifecycle Rules Documentation.

Why it matters

This feature is useful in a variety of scenarios:

  1. Network failures: If a network interruption prevents the final completion step, the unfinished upload will no longer remain indefinitely. Instead, it will be automatically cleared after the defined period, ensuring you aren’t paying for useless storage.
  2. User interruptions: If an upload is manually paused or forgotten before completion, lifecycle rules will take care of these fragments, preventing forgotten uploads from lingering in storage.
  3. Script failures: If your script fails or times out during the upload process, any incomplete files won’t go unnoticed. They’ll be cleared as per your rules, ensuring efficient storage management.

Cost-saving benefits

Unfinished uploads can quickly add up, both in storage usage and costs. By automatically canceling incomplete uploads, users can significantly reduce unnecessary expenses, keeping storage budgets under control. This is especially important for businesses with large-scale data transfers, where managing storage efficiency can have a direct impact on the bottom line.

What’s next?

Most users configure lifecycle rules through the console or Backblaze B2 command line tool (CLI), so we introduced this feature for the B2 Native API to address immediate customer needs while also laying the groundwork for integrating it into the B2 Cloud Storage web console. You can now use this feature via the CLI or B2 Native API. We’re working on adding UI support to make configuration even more accessible. Let us know in the comments if you’re looking for access to this feature via a different user interface.

In the meantime, here are a few steps you can take:

  1. Implement lifecycle rules: Set rules that fit your upload behavior. Choose a reasonable timeframe to cancel unfinished large file uploads that balances with your cost-management goals.
  2. Test the feature: Try configuring the lifecycle rule for a few test uploads to make sure it behaves as expected. Monitor how it handles interruptions or failures to ensure it aligns with your needs.
  3. Monitor storage costs: Check your storage usage and billing before and after setting these rules to understand the impact on costs. Use the feedback to fine-tune your settings.
  4. Stay tuned for UI updates: Keep an eye out for announcements regarding UI support for this feature. We’re committed to making it as intuitive and accessible as possible.

By leveraging lifecycle rules for unfinished large file uploads, you can maintain a cleaner, more efficient storage environment while saving money. For more details on configuring lifecycle rules, visit our API documentation.

The post Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI 101: Building and Deploying an AI Model

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-building-and-deploying-an-ai-model/

A decorative image showing a computer, a cloud, and a building.

Should you build your own AI model? Or use other services to help you accelerate the process?

Once you’ve defined the problem you’re trying to solve and the AI model type that best fits your needs, these are the questions you’re faced with next—where to deploy an AI model and how to go about doing it. In most cases, there is very little reason for you to build, train, and deploy your AI model from scratch, particularly as more and more vendors are stepping in to help companies with all or some of the process. It’s fundamentally complex, takes tons of resources and requires specialized knowledge to do correctly. 

Still, you should have a basic understanding of the AI model training and deployment processes, as these learnings will be useful as later on as you explore various predefined tools, applications, and services you can use to expedite or enhance your ability to use AI within your organization. That’s what I’m digging into today.

How AI model training works

There are several steps in training an AI model which include identification and gathering the data required, data cleansing and assembly, training the model, checkpointing, and, finally, model serving where the model is deployed into the production environment. Here’s an overview of the process. 

A diagram that explains the AI model training process.

Let’s take a minute to explore each of the steps in a little more detail.

Step 1: Review 

The organizational data needed to help educate your model will either be structured or unstructured. Structured data is found in databases, tables, and so on. Unstructured data is basically everything else. Some unstructured data is easy to process, such as text files, while other data is harder to extract, such as PDFs and images. 

In general, the more data you can provide, the better your trained model can be. But, remember to include data that is not what you want as well—this helps models to hone in on the specific piece of information when things are similar. Take this example scenario, for instance:  

You are monitoring hundreds of thousands of wooded acres to determine if there is a fire on the land. As part of training the model, you need to provide images of the legitimate flora and fauna along with images of fire. But you should also provide images of what is not fire, for example reflections of the sun or moon on a lake, a group of lightning bugs at night, car headlights, and so on.

Step 2: Clean 

As the data is collected, it will need to be pre-processed, which involves several techniques such as cleaning the data to handle missing values, removing outliers, scaling features, encoding categorical variables, and splitting the data into training and testing sets. The data needs to be arranged in a manner acceptable to the model itself. This sounds relatively simple, but some studies show that this can take up to 80% of the total model development process time

Step 3: Stage 

This is a collection point for all of the clean, ready to be processed, data. This data will arrive as it is processed (cleaned) which can occur over several days or even weeks. Having this data on hand will be useful if the model is not generated correctly or in the future as a starting point to retrain the model.

Typically large amounts of your data will be cleaned and staged as it is readied to train the AI model. But, there are no special storage requirements for this data. It just needs to be readily available to be uploaded to the AI training environment when the time comes. 

Step 4: Train 

Model training is a resource intensive process where data is copied from staging to high-performance storage located in close proximity to whatever high-powered processor you’re rocking, usually a graphical processing unit (GPU). The GPUs then run the algorithms developed specifically for training the model, and the data is iteratively read and processed an indeterminate number of times until training is complete. Minimizing the time spent utilizing these expensive, high-powered storage and processing resources is critical in managing the overall cost of building the model. In other words: get in, process, and get out.

Step 5: Checkpoint 

During the building of the model, the programming will often create snapshots of the status of the training process. This will include various variables, state changes, and so on. These snapshots are referred to as checkpoints. They initially will be written to local storage within the model training system, and are used to restart the training process from a known good state if something goes wrong. 

Once the model training process is complete, checkpoints should be written to the same centralized data storage location as your staged data. The checkpoint data will become part of the documentation of the model and may be used for forensic purposes should the model not behave appropriately once it is deployed.

Step 6: Serve 

Once the training process is complete, the model can be exported to your central storage location. This will once again help document the system, and from there the model can then be uploaded to the local or cloud compute environment where it will be used.  

At this point you have a clean version of the source data, the checkpoints of the model created, and a copy of the model itself, all stored in your centralized location under your control and readily available should they be needed in the future. 

AI model inference

The term inference is derived from the AI model’s perspective. At a high level, when given a prompt, the model infers its response from the trained model and its data. In simple terms, you’ve trained your model to recognize cats, and then you bring it new data (a picture of a family reunion) and ask your model if it sees any cats in the photo (I’m hoping the answer is yes). 

In AI, the prompt is viewed as new data which is compared to the model’s existing data to determine a response typically in the form of a decision, prediction, or new content as is the case with generative AI models. 

An overview of the inference process is below:

A diagram of the AI inference process.

In some AI systems, the inference process flow includes some additional code to help improve your model. These types of filters can have a range of uses and can happen on either the input or the output stage. For example, if you want to filter inappropriate queries or information, you could include something like keyword filtering when data (the prompt) is input. Or, you could introduce a toxicity detection filter on the output side, which reviews responses and prevents harmful or offensive content to be presented to the user.  

A perhaps better understood problem that filters like this can address is how to get accurate and up-to-date information out of your queried response. On the input flow side of things, retrieval-augmented generation (RAG) directs a trained model to incorporate and weight more heavily information from trusted sources that the user designates. On the output side, you might add a hallucination prevention filter, which would stop the model from presenting false or misleading information.  

More broadly, you’ll notice that both the prompt and response are saved. It is important to review this information on a periodic basis. This is especially true if the model is public facing, if  you are using a model which can change over time such as a foundation model, or if you are using a model which utilizes RAG techniques to include new or external content. 

In all of those examples, your model can drift as new information is introduced, and, as we noted above, getting the right information and cleaning it properly is likely the most time-intensive and important stage of this process. Not for nothing is the phrase “knowledge is power” a truism—in the age of AI, knowledge is power and good data is king. 

The post AI 101: Building and Deploying an AI Model appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Seamless Data Migration with Custom Upload Timestamps

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/seamless-data-migration-with-custom-upload-timestamps/

A decorative image showing two cubes, representing data, moving from cloud to cloud. There are clocks above each cube.

Migrating data to the cloud? Ensuring that original timestamps remain intact through a cloud migration can be a critical factor for successful data management at scale. Losing these timestamps can lead to operational challenges that hinder your ability to track data effectively, set proper lifecycle rules, create custom events, and more.

Backblaze B2 Cloud Storage now offers the Custom Upload Timestamps feature to help you manage your data. Today, I’m sharing details on the new feature, benefits, and how to enable it.

What are Custom Upload Timestamps?

The Custom Upload Timestamps feature is designed specifically to retain the original timestamps of your files during a migration. It is especially beneficial for users who rely on lifecycle rules to dictate file deletion or archiving based on age for compliance, to track file age manually, or maintain historical context of file.

Imagine this scenario: You have a critical file on another cloud storage provider, governed by a lifecycle rule that deletes it after 1,000 days. If you move the file to Backblaze B2 on day 999, the timestamp would be overwritten and you’d have to restart that lifecycle from day one. However, with this new feature, the original timestamp remains intact, and the file will still get deleted on day 1,000, just as planned. This capability not only simplifies the migration process, but also ensures continuity in your data retention policies, keeping your storage costs in line with expectations.

Benefits of Custom Upload Timestamps

Lifecycle rules play a crucial role in managing data retention, particularly when migrating large datasets. Losing the original timestamps means you’d have to manually reconfigure your rules or wait much longer for lifecycle events to take effect. The benefits of retaining original timestamps extend beyond just lifecycle rules.

Here is why this feature is essential:

  1. Operational efficiency: Knowing the original timestamp of files allows for better organization and tracking. This is vital for businesses that rely on historical data to inform decisions or manage projects. When timestamps reset, it can lead to confusion and disarray in managing files. You may find yourself dealing with files that should have been deleted or archived but aren’t because of the reset timeline.
  2. Compliance: For organizations that must adhere to regulatory standards for data retention, preserving timestamps can help meet legal requirements. It provides a clear audit trail and evidence of when files were created or modified.
  3. Decreased workload: Manually tracking and reconfiguring lifecycle rules consumes valuable time and resources. By retaining the original timestamps, you eliminate unnecessary workloads.
  4. File age tracking: Whether you’re managing backups, archival processes, or simple organizational tasks, knowing the age of a file can inform your decisions regarding when to review or delete files.
  5. Historical context: For projects that span long periods, retaining timestamps helps maintain the context of data. This can be critical for collaborative efforts or projects that require consistent documentation.

Ultimately, the custom upload timestamps feature supports greater data portability, making it easier to move and manage large datasets. It ensures that migration to B2 Cloud Storage is as seamless as possible—without the need to reset or alter your data management policies. 

Ready to get started?

The Custom Upload Timestamps feature is enabled by default for all B2 Cloud Storage customers. To utilize this feature, you need to include the X-Bz-Custom-Upload-Timestamp parameter when calling the b2_upload_file API. This simple addition allows you to retain the original timestamp of your file, thereby preserving its lifecycle state without interruptions and ensuring that your data remains organized and easy to track.

By retaining the original timestamps, Backblaze B2 helps increase the ease and granularity with which you can manage your data, especially for organizations migrating large volumes of data. You can transition your data while maintaining control over important metadata like the original timestamp, streamlining your operations, improving overall efficiency, and avoiding the stress of potential compliance issues.

What next?

To make the most of the Custom Upload Timestamp feature, consider the following actionable steps:

  1. Review your migration workflow. Before starting the migration, ensure that your processes include the X-Bz-Custom-Upload-Timestamp parameter in your upload scripts or APIs. This will help prevent any disruption in tracking important metadata.
  2. Test the feature. Conduct a pilot migration with a small number of files. This will allow you to confirm that the timestamps are retained correctly. Monitor the behavior of your data tracking after this test migration to ensure everything operates as expected.
  3. Verify lifecycle rules. Once you complete the migration, take the time to check that your lifecycle policies continue to function as intended on B2 Cloud Storage. This verification step is crucial to avoid unexpected data retention issues.
  4. Engage with Support. If you have any questions or encounter challenges, don’t hesitate to reach out to our Support team. We’re here to help you make the most of B2 Cloud Storage.

For more details, visit our API documentation to ensure you’re ready for a smooth migration. By leveraging the Custom Upload Timestamps feature, you can simplify your data management processes.

The post Seamless Data Migration with Custom Upload Timestamps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Troubleshooting Disaster Recovery Scenarios: 10 Mistakes to Avoid

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/troubleshooting-disaster-recovery-scenarios-10-mistakes-to-avoid/

A decorative image showing a hammer smashing a drive.

When it comes to disaster recovery (DR), hope isn’t a plan. Yet I’ve seen the same story play out too many times: Companies find themselves scrambling when the unthinkable happens, discovering that their disaster recovery strategy is, well, full of holes. It’s like packing a parachute: You don’t want to find out what you missed when you’re already falling through the air. From my experience, there are some common mistakes businesses make that can turn a manageable problem into a fire drill. 

In this post, I’m sharing the top 10 disaster recovery mistakes I’ve come across when helping businesses think through their disaster recovery posture so that you can strengthen your own safety net. By avoiding these mistakes and implementing a comprehensive DR plan, you can ensure a rapid and efficient recovery from unforeseen disruptions.

1. Proximity paradox

A geographically close disaster recovery site offers limited protection. A natural disaster impacting your primary location could easily disable the nearby DR facility as well. And, if you don’t have a DR site, this could still apply to your business if you keep your backups nearby, such as in a tape storage facility down the road.

How Pittsburg State solved the proximity paradox

Pittsburg State University is located in Kansas in the heart of tornado alley. Disaster planning is nonnegotiable, and the university didn’t want to take chances with their data. See how they set up a robust private cloud with nodes across the state and backed all of their data up to immutable cloud storage with Backblaze B2.

Read the Story ➔ 

2. Untested backups

Backups that haven’t been restored and verified are unreliable. Regularly test your backups to ensure a smooth recovery process during a disaster.

3. Replication trap

Relying solely on replication for DR creates a single point of failure. If your primary site is compromised, the replicated data at the DR site might be compromised as well. Off-site full and incremental backups are essential.

4. Paper plan peril

A DR plan gathering dust on a shelf is useless. Conduct regular drills to simulate disaster scenarios and expose weaknesses in your plan.

5. Snapshot snafu

Snapshots are not comprehensive backups. Using snapshots for long term storage and retention introduces both technical and compliance risks in relation to how snapshots are managed. This affects both cloud and on-premises platforms.

6. SaaS surprises

Software as a service (SaaS) providers like Microsoft 365 and Google Workspace focus on high availability, but they operate on a shared responsibility model, meaning they may have limited built-in protection and recovery options. You may not be managing servers, but you do need a comprehensive data protection plan including regular, incremental backups outside of the SaaS platform.

7. Unforeseen force majeure

Disasters come in all shapes and sizes. Don’t limit your DR plan to common IT disruptions. Consider scenarios like widespread power outages or communication breakdowns, and plan accordingly. The goal is holistic cyber resilience—not only identifying threats and protecting against them, but also withstanding attacks as they’re happening and responding effectively.

8. Backup infiltration

Bad actors are increasingly targeting backups to increase the chances of a payout. Utilize immutable backups, unchangeable after creation, for an extra layer of protection against ransomware attacks.

9. Cloud drive disasters

Storing data on Google Drive, Dropbox, OneDrive, etc. is incredibly common. But these platforms do not protect against ransomware and provide limited point-in-time recovery options. Cloud drives are not a sufficient backup of your data.

10. Overlooking compliance

Factor in compliance needs when building your data protection and DR strategy. Regulations like HIPAA, GDPR, and others may have security or archival requirements that should be considered in your plan.

Invest in cyber resilience

After working in the disaster recovery space, I can tell you this: It’s not just about having a plan; it’s about having one that works when it counts. The mistakes I’ve covered here are common, but they’re also avoidable. Take the time to address these now, and you’re not only protecting your systems and data, but your company’s future. For me, a strong DR plan is an investment in resilience, and it’s there to catch you when you need it most.

The post Troubleshooting Disaster Recovery Scenarios: 10 Mistakes to Avoid appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

5 Ways to Use Event Notifications to Advance Your Media Better, Faster

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/5-ways-to-use-event-notifications-to-advance-your-media-better-faster/

A decorative image showing a cloud with digital lines and media icons.

In the hurry-up-and-wait world of media production, anything you can do to speed through the hurry-ups and avoid or shorten the waits is not just a gift—it’s an advantage that can mean happier team members, delighted clients and fans, and more revenue.

Backblaze Event Notifications can help.This new B2 Cloud Storage feature can help you streamline a range of your production tasks—like automatically starting transcoding of video and distributing new images—across your preferred workflow tools. 

Today, I’m sharing five ways you can use Backblaze Event Notifications to operationalize media production efficiencies. If you’re interested in Event Notifications for applications, check out this post; and stay tuned for a future post on how to use Event Notifications for IT backup.

Event Notifications for media production: Simplified automation

Event Notifications monitors your B2 Cloud Storage for data changes that you designate—think raw video uploads, content version updates, deletions, etc.—and delivers near real-time alerts where you want them about these changes. These alerts can be used to create awareness faster, and even more powerfully, to initiate streamlined end-to-end processes that can save you time and hassle, and avoid unnecessary manual tasks and/or the cost of complex intermediaries.

What are webhooks?

Webhooks, if you’re not familiar with the term, are HTTP-based callback functions that enable event-based communications between software applications. Backblaze Event Notifications can uniquely work with any external service that accepts webhooks. This means you can use it to your advantage across your media production workflow—and this is novel when most vendors’ alerts features are limited to closed ecosystems or require significant and sometimes costly workarounds to communicate beyond a limited set of production tools.

Top 5 use cases for media production

Here are specific, practical ways people producing and managing media can take advantage of Event Notifications for immediate benefits.

1. New content processing

Event Notifications can be used to trigger tasks immediately after new content is uploaded. Imagine one of your team members uploads a video or image: Event Notifications can be sent to a transcoding service to format it and a tagging service to categorize it for better content organization. Set up to furthermore extract valuable metadata too—all in near real time, without manual intervention. 

General workflow (abbreviated)

By automating these processes, companies can ensure that user-generated content is formatted correctly, appropriately tagged, and moderated without delay. This not only saves time but guarantees a consistent user experience.

What’s more, you can even go full Jedi Knight and handle errors programmatically with Event Notifications logic that triggers reprocessing tasks whenever issues arise.

2. Integrated alerts in go-to tools

Event Notifications can easily integrate with your communication tools like Slack and productivity tools like Zapier, to inform internal and external stakeholders of updates without them needing to check for them manually. Users have told us this is a great way to keep people updated when assets are added, updated, or advanced to key stages in production and post cycles—setting them up to consider taking downstream actions that don’t lend themselves to further process automation.

Asset change announcement workflow

Additionally, for teams using workflow tools such as Zapier to connect various services, Event Notifications makes it simple to trigger actions across multiple platforms, enabling powerful, automated workflows with your data in B2 Cloud Storage.

3. Over-the-top (OTT) streaming automation

Regardless of whether your streaming model is AVOD, TVOD, or SVOD, Event Notifications can help automate processing and distribution workflows. Users can enable them so that every time a new title is added to B2 Cloud Storage, it then triggers alerts that initiate transcoding, compression, and prep for delivery or playback via content delivery network (CDN).

OTT streaming platform workflow

4. Backup completion monitoring

An important (if unglamorous) aspect of managing media is backing it up for extra safekeeping. After all, it’s a precious asset worth a lot of money now and later. So whether you back up nightly, monthly, at project’s end, or on some other cadence, with Event Notifications, customers can set up to receive updates when their media backups are successfully uploaded to a Backblaze B2 Bucket—providing peace of mind when data is protected.

We’ve also had a few users already tell us that not seeing backup completion alerts when expected helped them realize that they had other, previously unknown workflow hiccups to address.

Backup complete workflow

Tangentially related, media organizations are also using Backblaze Cloud Replication to programmatically store their content to two or more geographically distributed locations for added protection—this isn’t the same as Event Notifications, but is another automation tool for enhancing your protection posture.

5. Monitor data usage

Since Event Notifications messages are sent within seconds of files being uploaded and deleted, and they contain the size of the file in question, you can easily and reliably track your data usage in near real time, helping you identify trends and potential issues. For example, if you know large raw files are coming in and then messages indicating much smaller than expected file sizes were uploaded, it can alert you to begin to QC it.

We’ve also seen such data monitoring prove highly beneficial to IT personnel who support them because the near real-time monitoring allows faster responses to situations as they are happening, thereby mitigating risks, reducing costs, and/or nipping issues in the bud so the production teams remain disruption and distraction free.

Monitoring workflow

Beyond these example use cases, Event Notifications opens up a wide range of possibilities for automating and optimizing workflows. This flexibility makes it easy to automate how your infrastructure interacts with and reacts to file changes in B2 Cloud Storage, simplifying workflows across your distributed services. So go ahead and get creative—and please do share with us the cool things you’re doing with Event Notifications.

Why Event Notifications matter for production workflows

The benefits of real-time notifications extend beyond simply saving time—they transform the way teams work, automate processes, and reduce the margin for error.

  • Awareness: Instant notifications for uploads, updates, or deletions keep everyone on the same page.
  • Actionable insights: Real-time alerts provide critical information that helps make informed decisions quickly.
  • Flexibility: Direct connections to services like media asset managers (MAMs), transcoding applications, and CDNs mean more choice to stick with your preferred stack and less lock-in to specific vendors or tools.
  • Cost efficiency: Automating tasks like media transcoding, data processing, or content delivery reduces the need for manual labor, saving on operational costs and freeing up resources for other strategic initiatives.

Improved security: By instantly alerting teams to changes or unusual activity, Event Notifications help maintain data integrity and support proactive security measures.

How Event Notifications compares

Unlike other offerings like Amazon’s messaging services, which are limited to specific ecosystems, Backblaze Event Notifications integrates directly with any service that accepts webhooks, offering true flexibility and avoiding vendor lock-in.

Event Notifications is also designed for at-least-once delivery, ensuring critical notifications are not missed. This reliability is important for teams building workflows that require precision and a level of consistency their end users expect. 

The pricing for Event Notifications is simple and transparent. Backblaze B2 Reserve customers enjoy unlimited free Event Notifications, while pay-as-you-go Backblaze B2 customers enjoy 2,500 calls per day free and then $0.004 per 10,000 transactions. This straightforward pricing applies no matter the service receiving the notification. This enables businesses to confidently scale their event-driven workflows, knowing exactly what to expect in terms of costs, regardless of the services they choose to integrate with. 

Ready to add automation to your media tasks?

For existing customers working with a Backblaze account manager, Event Notifications is already enabled for you, and your account manager can assist with any questions. If you’re an existing customer not currently working with an account manager, please contact our Support team to request access to Event Notifications. 

New customers can contact our Sales team to learn more about how Event Notifications can streamline workflows and how to get started.

Once Event Notifications are enabled, log in to your Backblaze B2 account, navigate to the Buckets page, and click on the Event Notifications section. From there, you can set up notification rules for the events you want to track or configure notifications using our API.

For detailed instructions and best practices, visit our Event Notifications documentation.

The post 5 Ways to Use Event Notifications to Advance Your Media Better, Faster appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

7 Ways to Use Event Notifications to Streamline Application Workflows

Post Syndicated from Amrit Singh original https://www.backblaze.com/blog/7-ways-to-use-event-notifications-to-streamline-application-workflows/

A decorative image showing a cloud with an alert symbol.

Event-driven infrastructure is at the core of modern application development. It helps businesses streamline processes like transcoding user-uploaded video or processing images for tagging, kicks off downstream workflows immediately, and reduces complexity by automating multi-step processes across distributed services. 

Today, I’m sharing seven ways you can use Backblaze Event Notifications to accelerate application workflows, automate processes, streamline operations, and scale revenue. If you’re interested in Event Notifications, but you’re not using it to run applications, stay tuned for future posts sharing use cases for media management and backup and archive. 

Event Notifications for applications: Simplified automation

Event Notifications delivers near real-time alerts for changes in B2 Cloud Storage, simplifying workflows across the services that interact with your stored data. Teams can use Event Notifications to create end-to-end processes that scale efficiently and integrate directly with any external service that accepts webhooks. This means no more manual monitoring of storage or relying on complex intermediaries.

What are webhooks?

Webhooks, if you’re not familiar, are a way for applications to communicate with each other by sending data automatically based on specific events, e.g., HTTP POST requests with a JSON payload. Notably, our Event Notifications feature isn’t limited to a closed ecosystem or subset of business tools.

Automating common application tasks with Event Notifications allows you to reduce operational overhead by minimizing manual monitoring, accelerate processes across integrations with your preferred services, and reduce manual entry errors that can cost enterprises time and money. 

Top 7 use cases for applications

Let’s explore some practical ways Event Notifications can be leveraged within your tech stack:

1. User-generated content processing

For applications dealing with user-uploaded content, Event Notifications can be used to trigger tasks immediately upon data upload. Imagine a user uploading a video or image: An Event Notification could be sent to a transcoding service to format it, a tagging service to categorize it, or even a moderation tool to ensure it complies with your community standards—all in near real time, without manual intervention.

Social platform workflow

By automating these processes, companies can ensure that user-generated content is formatted correctly, appropriately tagged, and moderated without delay. This not only saves time but guarantees a consistent user experience. 

2. Integrated alerts with automation tools

Event Notifications can easily integrate with productivity tools like Slack and Zapier, or any service that accepts a webhook, making it easy to provide team-wide awareness into changes in your storage environment without manual checks. This keeps teams informed and at the ready to be able to respond immediately to critical events.

Asset tracking and monitoring workflow

Additionally, for teams using workflow platforms such as Zapier to connect various services, Event Notifications makes it simple to trigger actions across multiple platforms, enabling powerful, automated workflows with your data in B2 Cloud Storage.

3. Surveillance and streaming automation

For applications managing large video files, such as surveillance or streaming platforms, Event Notifications can help automate the processing and distribution workflows. Videos can be transcoded, compressed, and prepared for delivery or playback promptly.

Streaming platform workflow

This automation is also useful for time-sensitive content, where quick turnaround is essential. Automating video processing reduces the manual effort involved and ensures content is always ready for viewing in the preferred format as soon as it’s available.

4. AI workload automation

For businesses building AI applications, Event Notifications can be used to trigger AI workloads in real time, enabling faster processing and response. For instance, when new data is uploaded, alerts can trigger downstream services to process that data, such as converting images to text or analyzing content for insights. 

AI image to text workflow

In this case, this AI workflow ensures tasks start the moment data becomes available. Whether you’re running an image recognition service, analyzing datasets, or building AI models, Event Notifications eliminates the delays that come with manual processing. No matter what your downstream service is, Event Notifications provides the flexibility to integrate seamlessly with your AI workflows, improving real-time processing capabilities and enabling teams to focus on delivering better solutions rather than managing manual data flow.

5. Monitor data usage

Since Event Notifications messages are sent within seconds of files being uploaded and deleted, and contain the size of the file in question, you can easily and reliably track your data usage in near real time, helping you identify trends and potential issues.

Monitoring workflow

In contrast with periodic usage reports, near real-time monitoring allows you to respond to situations as they are happening, mitigating risks and potentially reducing costs.

6. Respond to security events

Event Notifications can feed near real-time data to security information and event management (SIEM) systems, allowing you to detect and respond to anomalous access patterns as they are happening.

Security alert workflow

Event Notifications allows you to take a proactive, rather than reactive, security posture, again mitigating risks and reducing costs.

7. Automatically trigger data integration

Event Notifications enable your data integration workloads to run within seconds of new data being uploaded to Backblaze B2, continuously delivering data to analytical systems and dashboards, giving you a live view of the state of your business.

Data integration workflow

Delivering data to dashboards within seconds or minutes of its availability enables near real-time insights, faster decision-making, and the ability to react to events as they occur.

Beyond these example use cases, Event Notifications opens up a wide range of possibilities for automating and optimizing workflows. You can use Event Notifications to automate metadata extraction and tagging for better content organization, and handle errors programmatically by triggering reprocessing tasks whenever issues arise. This flexibility makes it easy to automate how your infrastructure interacts with and reacts to data changes in B2 Cloud Storage, simplifying workflows across your distributed services.

Why Event Notifications matter for application workflows

The benefits of real-time notifications extend beyond simply saving time—they transform the way teams work, automate processes, and reduce the margin for error.

  • Awareness: Instant notifications for data changes, uploads, or deletions keep everyone on the same page.
  • Actionable insights: Whether it’s confirming a successful upload or catching an unexpected change, real-time alerts provide critical information that helps make informed decisions quickly.
  • Flexibility: Direct connections to services like transcoding, compute, or serverless applications mean more choice and less lock-in to specific vendors or tools.
  • Improved security: By instantly alerting teams to unauthorized changes or unusual activity, Event Notifications help maintain data integrity and support proactive security measures.
  • Cost efficiency: Automating tasks like media transcoding, data processing, or content delivery reduces the need for manual labor, saving on operational costs and freeing up resources for other strategic initiatives.

How Event Notifications compares

Unlike other offerings like Amazon’s messaging services, which are limited to specific ecosystems, Event Notifications integrates directly with any service that accepts webhooks, offering true flexibility and avoiding vendor lock-in.

Event Notifications is also designed for at-least-once delivery, ensuring critical notifications are not missed. This reliability is important for teams building workflows that require precision and a level of consistency their end users expect. 

The pricing for Event Notifications is simple and transparent, with 2,500 calls per day free, and just $0.004 per 10,000 transactions. This straightforward pricing applies no matter the service receiving the notification. This enables businesses to confidently scale their event-driven workflows, knowing exactly what to expect in terms of costs, regardless of the services they choose to integrate with. 

Ready to add automation to your application?

For existing customers working with a Backblaze account manager, Event Notifications is already enabled for you, and your account manager can assist with any questions. If you’re an existing customer not currently working with an account manager, please contact our Support team to request access to Event Notifications. 

New customers can contact our Sales team to learn more about how Event Notifications can streamline workflows and how to get started.

Once Event Notifications are enabled, log in to your Backblaze B2 account, navigate to the Buckets page, and click on the Event Notifications section. From there, you can set up notification rules for the events you want to track or configure notifications using our API.

For detailed instructions and best practices, visit our Event Notifications documentation.

The post 7 Ways to Use Event Notifications to Streamline Application Workflows appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Rate Limiting Policy for Consistent Performance

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/rate-limiting-policy/

A decorative image showing clouds connected by digital lines.

Highways have lanes for a reason. The lanes help ensure that large volumes of traffic can reach their destinations quickly and safely. And they support order and predictability in systems where some folks want (or need) to go NASCAR fast and others like myself a little less so.

Backblaze is now applying such fundamental highway engineering thinking to the B2 Cloud Storage platform, introducing a rate limiting policy designed to effectively support different types of customers and usage demands so everyone can continue to enjoy the high performance storage they need while better protecting all from the risk that any one customer or set of customers creates a traffic pileup for everyone else. 

In practical terms, the new Backblaze policy prevents unexpected API usage spikes by limiting customers’ call and byte rates to specific thresholds per a specific period of time; if the rate is exceeded, the customer will temporarily receive a 503 status code when using our S3 compatible API, or a 429 status code when using our Backblaze B2 native API. This is similar to policies and status codes you’ve seen from other global cloud object storage providers including Amazon Web Services S3 and Microsoft Azure. 

Based on our analysis of customer usage patterns, we are confident that the overwhelming majority of you will not reach rate limit thresholds–just smooth sailing, or open road if we stick to the highway/lanes analogy.

Backblaze can also assist customers that need a limit increase, for performance and proof of concept (PoC) testing, recovery and restore, and/or anticipated everyday needs.

Click down details:

  • All Backblaze B2 customers will be under the governance of the policy after it is rolled out across the platform. Backblaze Computer Backup usage is not within the scope of this policy.
  • Customers will be assigned different default rate limits based on account history and usage patterns, as well as information gleaned during sales-assisted implementation and renewal planning discussions.
  • New, self-service customers with smaller datasets stored will initially be provisioned for uploads up to 50 requests and 100MB per second, and for downloads up to 20 requests and 25MB per second, all per account. Other API operations may also be limited to keep traffic flowing, but again, this won’t be noticeable to most customers.
  • Customers with larger datasets and all sales assisted customers whom we’ve supported during implementation and/or renewal will be provisioned with significantly higher limits and can be eligible for custom limits.
  • Traffic analysis and engineering is a dynamic activity, so we’ll likely revise limits over time in response to evolving usage patterns, improvements we roll out, and, of course, customer feedback. We will announce significant changes here on the blog.

You can visit our API documentation for more information. Please also don’t hesitate to contact our support team with any questions and/or to proactively talk about the right rate limits to serve your unique needs.

The post Backblaze Rate Limiting Policy for Consistent Performance appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Drive Stats for Q3 2024

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-2024/

A decorative image that displays the words Q3 2024 Drive Stats.

As of the end of Q3 2024, Backblaze was monitoring 292,647 hard disk drives (HDDs) and solid state drives (SSDs) in our cloud storage servers located in our data centers around the world. We removed from this analysis 4,100 boot drives, consisting of 3,344 SSDs and 756 HDDs. This leaves us with 288,547 hard drives under management to review for this report. We’ll review the annualized failure rates (AFRs) for Q3 2024 and the lifetime AFRs of the qualifying drive models. Along the way, we’ll share our observations and insights on the data presented and, as always, we look forward to you doing the same in the comments section at the end of the post.

Hard drive failure rates for Q3 2024

For our Q3 2024 quarterly analysis, we remove the following from consideration: drive models which did not have at least 100 drives in service at the end of the quarter, drive models which did not accumulate 10,000 or more drive days during the quarter, and individual drives which exceeded their manufacturer’s temperature specification during their lifetime. The removed pool totalled 471 drives, leaving us with 288,076 drives grouped into 29 drive models for our Q3 2024 analysis. 

The table below lists the AFRs and related data for these drive models. The table is sorted ascending by drive size then ascending by AFR within drive size.

Notes and observations on the Q3 2024 Drive Stats

  • Upward AFR. The quarter-to-quarter AFR continues to creep up rising from 1.71% in Q2 2024 to 1.89% in Q3 2024. The rise can’t be attributed to the aging 4TB drives, as our CVT drive migration system continues to replace these drives. As a consequence, the AFR for the remaining 4TB drives was 0.26% in Q3. The primary culprit is the collection of 8TB drives, which are now on average over seven years old. As a group, the AFR for the 8TB drives rose to 3.04% in Q3 2024, up from 2.31% in Q2. The CVT team is gearing up to begin the migration of 8TB drives over the next few months.
  • Yet another golden oldie is gone. You may have noticed that the 4TB Seagate drives (model: ST4000DM000) are missing from the table. All of the Backblaze Vaults containing these drives have been migrated, and as a consequence there are only two of these drives remaining, not enough to make the quarterly chart. You can read more about their demise in our recent Halloween post. 
  • A new drive in town. In Q3, the 20TB Toshiba drives (model: MG10ACA20TE) arrived in force, populating three complete Backblaze Vaults of 1,200 drives each. Over the last few months our drive qualification team put the 20TB drive model through its paces and, having passed the test, they are now on the list of drive models we can deploy.
  • One zero. For the second quarter in a row, the 14TB Seagate (model: ST16000NM00J) drive model had zero failures. With only 185 drives in service, there is a lot of potential variability in the future, but for the moment, they are settling in quite well.
  • The nine year club. There are no data drives with 10 or more years of service, but there are 39 drives that are nine years or older. They are all 4TB HGST drives (model: HMS5C4040ALE640) spread across 31 different Storage Pods, in five different Backblaze Vaults and two different data centers. Will any of those drives make it to 10 years? Probably not, given that four of the five vaults have started their CVT migrations and will be gone by the end of the year. And, while the fifth vault is not scheduled for migration yet, it is just a matter of time before all of the 4TB drives we are using will be gone.

Reactive and proactive drive failures

In the Drive Stats dataset schema, there is a field named failure, which displays either a 1 for failure or a 0 for not failed. Over the years in various posts, we have stated that for our purposes drive failure is either reactive or proactive. Furthermore, we have suggested that failed drives fall basically evenly into these two categories. We’d like to put some data behind that 50/50 number, but first let’s start by defining our two categories of drive failure, reactive and proactive. 

  • Reactive: A reactive failure is when any of the following conditions occur: the drive crashes and refuses to boot or spin up, the drive won’t respond to system commands, or the drive won’t stay operational. 
  • Proactive: A proactive failure is generally anything not a reactive failure, and typically is when one or more indicators such as SMART stats, FSCK (file system) checks, etc., signal that the drive is having difficulty and drive failure is highly probable. Typically a multitude of indicators are present in drives declared as proactive failures.

A drive that is removed and replaced as either a proactive or reactive failure is considered a drive failure in Drive Stats unless we learn otherwise. For example, a drive is experiencing communications errors and command timeouts and is scheduled for a proactive drive replacement. During the replacement process, the data center tech realizes the drive does not appear to be fully seated. After gently securing the drive, further testing reveals no issues and the drive is no longer considered failed.  At that point, the Drive Stats dataset is updated accordingly.

As noted above, the Drive Stats dataset includes the failure status (0 or 1) but not the type of failure (proactive or reactive). That’s a project for the future. To get a breakdown of different types of drives failure we have to interrogate the data center maintenance ticketing system used by each data center to record any maintenance activities on Storage Pods and related equipment. Historically, the drive failure data was not readily accessible, but a recent software upgrade now allows us access to this data for the first time. So in the spirit of Drive Stats, we’d like to share our drive failure types with you. 

Drive failure type stats

Q3 2024 will be our starting point for any drive failure type stats we publish going forward. For consistency, we will use the same drive models listed in the Drive Stats quarterly report, in this case Q3 2024. For this period, there were 1,361 drive failures across 29 drive models. 

We actually have been using the data center maintenance data for several years as each quarter we validate the failed drives reported by the Drive Stats system with the maintenance records. Only validated failed drives are used for the Drive Stats reports we publish quarterly and in the data we publish on our Drive Stats webpage.

The recent upgrades to the data center maintenance ticketing system have not only made the drive failure validation process easier, we can now easily join together the two sources. This gives us the ability to look at the drive failure data across several different attributes as shown in the tables below. We’ll start with the number of failed drives in each category and go from there. This will form our baseline data.

Reactive vs. proactive drive failures for Q3 2024

Observation period Reactive failures Proactive failures Total failures Reactive % Proactive%
Q3 2024 failed drives 640 721 1,361 47.0% 53.0%

Reactive vs. proactive drive failures for Q3 2024

Manufacturer Reactive failures Proactive failures Total failures Reactive % Proactive %
HGST 194 177 371 52.3% 47.7%
Seagate 258 272 530 48.7% 51.3%
Toshiba 124 221 345 35.9% 64.1%
WDC 64 51 115 55.7% 44.3%

Reactive vs. proactive drive failures by Backblaze data center

Backblaze data center Reactive failures Proactive failures Total failures Reactive % Proactive %
AMS 36 77 113 31.9% 68.1%
IAD 50 92 142 35.2% 64.8%
PHX 179 201 380 47.1% 52.9%
SAC 0 151 148 299 50.5% 49.5%
SAC 2 224 203 427 52.5% 47.5%

Reactive vs. proactive drive failures by server type

Server type Reactive failures Proactive failures Total failures Reactive % Proactive %
5.0 red Storage Pod (45 drives) 4 2 6 66.7% 33.3%
6.0 red Storage Pod (60 drives) 433 349 782 55.4% 44.6%
6.1 red Storage Pod (60 drives) 70 107 177 39.5% 60.5%
Dell Server (26 drives) 22 61 83 26.5% 73.5%
Supermicro Server (60 drives) 111 202 313 35.5% 64.5%

Obviously, there are many things we could analyze here, but for the moment we just want to establish a baseline. Next, we’ll collect additional data to see how consistent and reliable our data is over time. We’ll let you know what we find.

Learning more about proactive failures

One item of interest to us is the different reasons that cause a drive to be designated as a proactive failure. Today we record the reasons for the proactive designation at the time the drive is flagged for replacement, but currently multiple reasons are allowed for a given drive. This makes determining the primary reason difficult to determine. Of course, there may be no such thing as a primary reason, as it is often a combination of factors causing the problem. That analysis could be interesting as well. Regardless of the exact reason, such drives are in bad shape and replacing degraded drives to protect the data they store is our first priority.

Lifetime hard drive failure rates

As of the end of Q3 2024, we were tracking 288,547 operational hard drives. To be considered for the lifetime review, a drive model was required to have 500 or more drives as of the end of Q3 2024 and have over 100,000 accumulated drive days during their lifetime. When we removed those drive models which did not meet the lifetime criteria, we had 286,892 drives grouped into 25 models remaining for analysis as shown in the table below.

Downward lifetime AFR

In Q2 2024, the lifetime AFR for the drives listed was 1.47%. In Q3, the lifetime AFR went down to 1.31%, a significant decrease from one quarter to the next for the lifetime AFR. This decrease is also contrary to the increasing quarterly AFR increase over the same period. At first blush, that doesn’t make much sense as an increasing quarter-to-quarter AFR should increase the lifetime AFR. There are two related factors which explain this seemingly contradictory data. Let’s take a look. 

We’ll start with the table below which summarizes the differences between the Q2 and Q3 lifetime stats.

Period Drive count Drive days Drive failures Lifetime AFR
Q2 2024 283,065 469,219,469 18,949 1.47%
Q3 2024 286,892 398,476,931 14,308 1.31%

To create the dataset for the lifetime AFR tables two criteria are applied: first, at the end of a given quarter, the number of drives of a drive model must be greater than 500, and, second, the number of drive days must be greater than 100,000. The first  criterion ensures that the drive models are relevant to the data presented; that is, we have a significant number of each of the included drive models. The second standard ensures that the drive models listed in the lifetime AFR table have a sufficient number of data points; that is, they have enough drive days to be significant. 

As we can see in the table above, while the number of drives went up from Q2 to Q3, the number of drive days and the number of drive failures went down significantly. This is explained by comparing the drive models listed in the Q2 lifetime table versus the Q3 lifetime table. Let’s summarize.

  • Added: In Q3, we added the 20TB Toshiba drive model (MG10ACA20TE). In Q2, there were only two of these drives in service.
  • Removed: In Q3, we removed the 4TB Seagate drive model (ST4000DM000) as there were only two drives remaining as of the end of Q3, well below the criteria of 500 drives.

When we removed the 4TB Seagate drives we also removed 80,400,065 lifetime drive days and 5,789 lifetime drive failures from the Q3 lifetime AFR computations. If the 4TB Seagate drive model data (drive days and drive failures) was included in the Q3 Lifetime stats, the AFR would have been 1.50%. 

Why not include the 4TB Seagate data? In other words, why have a drive count criteria at all? Shouldn’t we compute lifetime AFR using all of the drive models we have ever used which accumulated over 100,000 drive days in a lifetime? If we did things that way, the list of drive models used to compute the lifetime AFR would now include drive models we stopped using years ago and would include nearly 100 different drive models. As a result, a majority of the drive models used to compute the lifetime AFR would be outdated and the lifetime AFR table would contain rows of basically useless data that has no current or future value. In short, having drive count as one of the criteria in computing lifetime AFR keeps the table relevant and approachable.

The Hard Drive Stats data

It has now been over 11 years since we began recording, storing, and reporting the operational statistics of the HDDs and SSDs we use to store data at Backblaze. We look at the telemetry data of the drives, including their SMART stats and other health related attributes. We do not read or otherwise examine the actual customer data stored. 

Over the years, we have analyzed the data we have gathered and published our findings and insights from our analyses. For transparency, we also publish the data itself, known as the Drive Stats dataset. This dataset is open source and can be downloaded from our Drive Stats webpage.

You can download and use the Drive Stats dataset for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, 3) you may sell derivative works based on the data, but 4) you can not sell this data to anyone; it is free.

Good luck, and let us know if you find anything interesting.

The post Backblaze Drive Stats for Q3 2024 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Solving the AI Training Data Challenge with Decart AI and Backblaze

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/solving-the-ai-training-data-challenge-with-decart-ai-and-backblaze/

A decorative image showing the logos of Backblaze and Decart.

Depending on which LLM you ask, we live in a world with somewhere between 25k and 80k AI startups. It’s a growing, highly competitive market where small startups with a big idea can find themselves toe-to-toe with the goliaths of tech—fighting for money, chips, talent, even raw electrical power. 

How does any company differentiate themselves in an explosive burst of technological change, one that requires a lot of investment in talent and infrastructure, where even the richest tech platforms on the planet don’t always succeed? Today we’re sharing the story of Decart—an AI startup that used Backblaze B2 Cloud Storage to leverage a successful launch with an impressive new model that provides an order of magnitude improvement in both the training and inferencing of the largest generative models.

Backblaze is an amazing solution for AI training data. We looked at a number of options and  Backblaze is seriously the best.

—Dean Leitersdorf, Co-Founder and CEO, Decart

First, the news

Decart is an AI research lab that came out of stealth on October 31 with an incredible new model:

While this might look like Minecraft, every pixel you see here and all of the gameplay is being generated by Decart’s Oasis model. It’s like Minecraft in every way you’d expect, except that the entire experience is being generated by AI and you can creatively prompt the model to build beyond the confines of the game. The mindblowing part? Decart says Oasis can perform more than 10 times more efficiently than competitors such as OpenAI’s Sora, which hasn’t been publicly released.

Don’t let the game distract you though—the Minecraft simulation is just an expression of the power of their model. According to the Decart team, this isn’t even version 1.0 of what their approach is capable of generating—more like version 0.01. Given the broad coverage they’ve already received for their launch, we’re excited to see what’s next.

How to break out in the AI market

For Decart, the strategy to pull ahead of the crowd was simple: Disrupt the market on inference speed to deliver game changing models, and do that by building the most high-performance multi-cloud model training infrastructure possible. Then, iterate on that innovation. 

We crafted state of the art infrastructure that allows us to train models that other people simply can’t train.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Before we met Dean and the team at Decart, most of the hard work was done: the multi-cloud AI stack for training was dialed in and the models were going through the paces. They just had one simple, but big, problem holding them back:

The price and the logistics of moving and storing training data were going to limit their growth.

They were burning through free data storage credits from a traditional cloud provider and had data spread across a range of other cloud providers and GPU clusters. Their training data needed to scale from 100s of thousands of hours of video data to 100s of millions of hours, and they needed a storage solution that could handle that scale in three key areas:

  1. Reliably high performance: Decart needed to know that when they got time on a cluster, they could move data in as fast as possible the second that they were able to. 
  2. GPU interoperability: They needed to be sure that whatever storage platform they chose, it would work well with a multi-cluster training approach. Being able to shop jobs between different GPU clouds and disperse training was essential for Dean’s team.
  3. Efficiency: Every dollar an AI startup spends on anything other than training time is a competitive disadvantage, so ensuring that storage costs were low without any surprise fees for data retention or download was key.

Decart discovered Backblaze while researching storage alternatives. After a quick call and two fast months of testing Backblaze in a wide variety of usage patterns, it was clear to the team that they had found the storage foundation they needed. 

We chose Backblaze because everything works. It’s super stable, and we had zero problems.  That’s number one.

—Dean Leitersdorf, Co-Founder and CEO, Decart

When it came time to start moving data from Backblaze to GPU clusters, they had no problem with transferring petabyte-scale datasets. The only minor challenge was ensuring that the compute provider’s pipe could take the volume of data streaming in.

Here’s where things ended up working for Decart:

  • Performance: They were blown away by the performance they achieved with Backblaze (more to come on that later).
  • Price: With pricing at one-fifth the cost of traditional cloud providers, Backblaze unlocked a significant amount of budget.
  • Free egress: The true game changer. Decart, for a number of reasons, trains their models on multiple different GPU clusters at the same time. With Backblaze, they can egress their full dataset to up to three training sites with zero additional cost.

B2 Cloud Storage was literally the only technical thing we used in training these models that didn’t crash the first time we tried it. We’re in an industry where everything fails, but Backblaze didn’t.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Looking forward

With performance, flexibility, and affordability squared away in their data storage approach, the Decart team is now in position to rotate out of this impressive first model and build whatever is next. With all the fundamentals working on the level that Backblaze always provides and Decart is happy with, the two teams are now working together to find even more efficiency and optimization and truly stand up the best infrastructure for training AI models.

The post Solving the AI Training Data Challenge with Decart AI and Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/quoth-the-drive-stats-nevermore-an-elegy-for-our-seagate-4tb-drives/

A decorative image showing a gravestone with ravens around it.

Once upon a midnight dreary, as I typed another query

Seeking many a quaint and curious fact of hidden Drive Stats lore—

While I waited, time advancing, suddenly the stats came dancing

Lines of empty datasets; the database had nothing more

“Is that right?” I muttered, “The database had nothing more—

So those drives, I must explore.”

Ah, distinctly I remember, it was just past this September

I requested failure rates of Seagate drives with terabytes of four

Eagerly I typed the query, even though my eyes were bleary

The count of Seagate fours was eerie, eerie; there was nothing more.

The sad and certain count screamed like it never had before;

No Seagate drives with terabytes of four.

There are missing rows, I’m certain, and files waiting to explore.

The reality I kept dismissing, the Seagate data must be missing

With hours gone to data fishing, the facts shook me to the core;

The spinning life is over for our Seagate drives with terabytes of four—

Those Seagate drives are nevermore.

(My apologies to Edgar Allen Poe.)

Shortly, we will publish the Q3 2024 Backblaze Drive Stats report, and an old faithful will be missing from the tables, the 4TB Seagate drive model ST4000DM000. This drive model has graced our Drive Stats charts and tables since the very first Drive Stats report, and it would be a ghastly mistake if we let the drive slip into the afterlife unnoticed. So on this All Hallows’ Eve, it’s only fitting we say nevermore to these Seagate drives.

The first 45 of these Seagate 4TB drives were installed in a 45-drive Backblaze Storage Pod in May 2013. That was before 60-drive Storage Pods, Backblaze Vaults, and even Backblaze B2. Over the next two years, thousands of new Seagate 4TB drives were added each quarter, and by Q3 2016, there were 34,744 spinning souls in service. That represented more than 50% of all the drives in service at the time—a howling success that has not been duplicated by any other drive model.

Alas, that didn’t last as the first wave of 8TB drives arrived in mid-2016 and with that, no additional 4TB Seagate drives were procured. Over time, as 4TB Seagate drives met their maker, the count decreased, and when Storage Pods containing these drives started being phased out in 2018, the count dropped faster. The final nail in the coffin came when, in 2023, our CVT drive migration system became fixated on the replacement of all the remaining 4TB Seagate drives, and here we are.

As for those intrepid 45 original drives installed in May 2013, they were not around at the end. They were unceremoniously replaced in a Storage Pod upgrade back in 2017. A few were resurrected as drive replacements, but today they only exist in the spirit world, having died or been replaced by 2020. Still many other 4TB Seagate drives have lived long happy lives, with nearly 100 exceeding 100 months of service (8.4 years) before being sent to their final resting place by the CVT reaper.

And so it is time; we shall gather in a circle, cross our arms and hold hands and chant “our Seagate drives…with terabytes of four…are nevermore!”

The post Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Partners with Opti9 and Adds Canadian Data Region

Post Syndicated from Teresa Dodson original https://www.backblaze.com/blog/backblaze-partners-with-opti9-and-adds-canadian-data-region/

A decorative image showing the Backblaze and Opti9 logos.

Backblaze and Opti9 are partnering up to bring Backblaze B2 Cloud Storage to joint customers around the world as well as businesses in Canada who are required to keep their data within national borders.

The who and the why

Opti9 is the international leader in hybrid cloud solutions that delivers managed cloud services, application development and modernization, backup and disaster recovery, security, and compliance solutions to businesses around the world. By bringing Backblaze into their solution set, Opti9 is onboarding high performance, low cost cloud storage that works within all the solutions they provide.

Increasingly, companies seeking managed services support are demanding solutions made up of best-in-breed providers. While traditional cloud platforms work against this principle of interoperability, Backblaze and solution providers like Opti9 are committed to delivering cloud solutions without the limitations, complexity, and high pricing that are holding businesses back.

As Jim Stechyson, the President of Opti9 put it:

Backblaze and Opti9 focus on empowering businesses with the best cloud solutions available. Being able to integrate the high performance and low total cost of ownership of Backblaze’s object storage into our set of solutions will greatly enhance our ability to drive success for our customers.

How to get started

Interested resellers or customers who want to start working with Opti9 and Backblaze today can go to the Opti9 website. Check out our joint S3 compatible hot storage offering and book your demo to get started.

Book a Demo ➔ 

For customers based in Canada, Backblaze will be opening a new data region centered in Toronto in the first quarter of 2025. As part of the partnership, Opti9 will be the exclusive provider of server backup solutions in the Canadian channel for Backblaze B2 Reserve and the Powered by Backblaze program.

More about the new Backblaze data region in Canada

The new Canadian data region gives businesses the freedom to access Backblaze’s open, interoperable cloud solution, while still allowing customers to benefit from local storage and compliance. Located in Toronto, Ontario, the data center has been assessed and maintains a security program that addresses the requirements of SOC 1 Type 2, SOC 2 Type 2, ISO 27001, PCI DSS, and HIPAA. The region will be available to customers in the first quarter of 2025. 

If you’d like to receive notifications about the data region opening date and when you can start storing data in Canada, you can sign up for the waitlist today.

Notify Me ➔ 

The post Backblaze Partners with Opti9 and Adds Canadian Data Region appeared first on Backblaze Blog | Cloud Storage & Cloud Backup