Tag Archives: Featured-Cloud Storage

5 Ways Event Notifications Strengthens Your Backup Strategy Automatically

2024-12-19 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/5-ways-event-notifications-strengthens-your-backup-strategy-automatically/

A decorative image showing a cloud with diagrammed icons around it.

“Our backups are good, right?”

If you’re responsible for backup operations, you’ve probably heard this question more times than you can count. While the answer should be a simple “yes,” staying on top of backup activities often involves checking multiple systems, reviewing logs, and maintaining manual tracking processes.

Today, I’m sharing five ways you can implement Backblaze Event Notifications into your data protection strategy to keep you and your team informed. If you’re interested in Event Notifications for other use cases, check out our posts for media production and application workflows.

Event Notifications for IT backup: Simplified automation

Event Notifications monitors your B2 Cloud Storage buckets for data changes that you designate—like completed backups, file deletions, or policy violations—and delivers real-time alerts where you want them. These alerts can trigger automated actions in any system that accepts webhooks, from PagerDuty to Zendesk to Slack channels and more.

Think of it as your storage system’s notification service: instead of discovering changes during routine recovery verification checks, you get instant awareness when something happens to the data in your buckets.

What are webhooks?

Webhooks, if you’re not familiar, are a way for applications to communicate with each other by sending data automatically based on specific events, e.g., HTTP POST requests with a JSON payload. What sets Backblaze Event Notifications apart is that it works with any service that accepts webhooks. This means you can integrate backup monitoring into your existing tools and processes, rather than being locked into specific vendors’ ecosystems.

5 ways to stay in the know with your backup strategy

Here are specific, practical ways you can take advantage of Event Notifications for immediate benefits to your backup and archive workflows.

1. Backup verification and reporting

When your backup software writes files to B2 Cloud Storage, Event Notifications helps verify successful completion of backup jobs. Each time a backup file lands in a bucket, you’ll receive a notification with key details like file size, timestamp, and backup job name. By feeding this data directly into communication tools like Slack, you can maintain comprehensive logs of backup activity without manual checks.

Backup monitoring workflow

Gone are the days of discovering backup issues hours or days later during routine reviews—you’ll know exactly when backups are uploaded. Teams can configure custom alerts for backup size thresholds, receive immediate confirmation of successful backups, and, with the help of Zapier, you can enable an alert when Event Notifications did not trigger, indicating a backup was not uploaded during a specified window.

2. Security and compliance monitoring

Event Notifications can help protect your backup data from unauthorized changes. Security teams can establish automated alerts for suspicious activities like mass deletions or modifications. These alerts integrate with your existing security information and event management (SIEM) systems to provide unified threat monitoring.

Security alert workflow

Beyond threat detection, Event Notifications enables preemptive policy enforcement. Teams can configure automatic notifications that guide employees when their actions might conflict with backup policies—like modifying file names, moving files, or even deletion. For persistent policy conflicts, managers can receive automated escalation alerts to address potential training needs or process gaps. This systematic approach helps maintain backup integrity through education and awareness before issues occur, rather than just detecting violations after the fact.

3. Storage management automation

Storage management becomes more efficient when Event Notifications feeds activity data directly to your management tools. As files are uploaded to and removed from your buckets over time, Event Notifications provides valuable data that helps you analyze storage utilization trends and backup data growth patterns.

Data usage monitoring workflow

This constant flow of information empowers teams to anticipate capacity needs and optimize resource allocation. Moving from reactive to proactive storage management helps control costs by notifying you when backups become larger on average.

4. Cross-bucket backup monitoring

Organizations using Cloud Replication or managing backups across multiple buckets gain valuable oversight through Event Notifications. This capability tracks file replication between regions and monitors backup activity across your entire footprint, giving you a comprehensive view of your distributed backup strategy. Teams can spot replication delays or issues immediately, rather than waiting for scheduled status checks.

Cloud Replication notification workflow

Understanding how data moves and grows across different locations ensures your distributed backup strategy performs as designed. Event Notifications makes it possible to track successful replications, monitor consistency between primary and replica buckets, and receive immediate alerts about any issues. This visibility is especially valuable for organizations maintaining geographic redundancy or managing complex multi-site backup strategies.

5. Integration with IT workflows

Event Notifications connects seamlessly with existing IT tools and processes through standard webhooks. Backup events can automatically flow into ticketing systems like Jira Service Management, monitoring dashboards like Grafana, or team communication channels like Microsoft Teams and Mattermost. This integration means teams can manage backup operations through familiar tools and processes, without needing to constantly switch between different interfaces or learn new systems.

Data integration workflow

The result is streamlined operations without the need for separate backup monitoring systems, ensuring backup activities receive proper attention within normal IT procedures. Teams can create ServiceNow tickets for failed backups, update Jira boards with backup status, or send notifications to Teams channels—all automatically and in real-time.

Why Event Notifications makes sense for backup teams

Managing backup operations has traditionally meant juggling multiple monitoring tools and hoping you catch issues before they impact recovery capabilities. Event Notifications transforms this approach by providing:

Automated awareness: Replace manual checks with instant visibility into bucket changes.
Enhanced security: Track backup data access and modifications as they happen.
Simplified monitoring: Feed backup activity data directly to your management tools.
Better operations: Free up time to focus on improving backup strategies rather than monitoring them.
Flexible integration: Adapt backup monitoring to fit your existing processes, not the other way around.

How it works with your environment

Unlike traditional backup monitoring solutions that often require specific software for notification handling, Event Notifications works with any service that accepts webhooks. This fundamental difference means you aren’t locked into specific vendors’ ecosystems or forced to use particular monitoring tools.

Event Notifications is designed for reliability with at-least-once delivery, ensuring critical backup events are never missed. This reliability is especially important for teams building automated workflows that require consistency and transparency in their backup monitoring.

The pricing model is straightforward and predictable: Backblaze B2 Reserve customers receive unlimited notifications at no additional cost, while pay-as-you-go customers get 2,500 notifications free each day and pay just $0.004 per 10,000 additional calls. This transparent pricing applies regardless of which services you’re connecting to, enabling teams to build comprehensive backup monitoring without worrying about unpredictable costs.

Ready to automate your backup monitoring?

If you’re working with a Backblaze account manager, Event Notifications are already enabled—just ask them for setup guidance. Other existing customers can contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn how Event Notifications can strengthen your backup operations.
Once enabled, visit the Event Notifications section in your B2 Cloud Storage buckets to configure your alerts. For detailed setup instructions and best practices, check out our Event Notifications documentation.

The post 5 Ways Event Notifications Strengthens Your Backup Strategy Automatically appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Disaster Recovery 101: Navigating Backup and Archive Infrastructure

2024-12-17 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/disaster-recovery-101-navigating-backup-and-archive-infrastructure/

An illustration of a city scape with lines travelling up to a cloud representing digital transmission.

Aging infrastructure, strained budgets, and exponential data growth create unique challenges for disaster recovery (DR) planning. When assessing your backup and archive infrastructure, you’re probably balancing data governance, data sovereignty requirements, compliance requirements, and the needs of your end users. Many legacy data storage systems can create gaps in an otherwise airtight DR plan.

Today, I’m talking through how to approach infrastructure decisions for your cyber resilience posture. You have a lot of options. On-premises? Cloud services? Hot? Warm? Cold? What combination works best for your needs? Understanding the nuances can help you sharpen your strategy.

Disaster recovery challenges

1. Relying on on-premises backup and archive infrastructure

Traditionally, businesses have relied heavily on on-premises backup solutions. Robust storage systems hold critical data, often backed up to secondary storage within the same physical location. While this approach offers a sense of control, it also presents vulnerabilities.

On-premises backups are at risk of localized events like loss of power, fire, flooding, or other natural disasters. A geographically separate DR site or other far off-site backup is essential for complete protection and compliance. Without this, the organization risks losing critical data in cases of a regional outage or loss of access.

The shift to public cloud and SaaS options opened the door to more secure and reliable data backup and disaster recovery solutions. By utilizing cloud-based storage and backup services, organizations can ensure that their data is protected in multiple locations, reducing the risk of data loss due to localized disasters. Additionally, cloud-based solutions offer scalability and flexibility, allowing organizations to easily expand their storage capacity as needed.

2. Falling into the replication trap

Many businesses have established alternate data centers as a secondary backup layer. However, these sites frequently only use replication technology. This situation can result in a scenario known as the “replication trap.” There is a risk that data compromised by malware is replicated to the DR site, leading to potential data loss.

Off-site, immutable backups, independent of the primary site’s data, are a key component of a robust DR strategy. In cases of malware attacks or accidental data deletion by users, off-site immutable backups allow for data retrieval from a backup saved prior to the incident and reduce possible interruptions.

3. Underestimating LTO limitations

Despite being viewed as a legacy technology, tape backups continue to be used in many organizations due to their reliability and cost-effectiveness. It is common to store tapes in a separate location to diversify data storage geographically, which helps reduce the impact of local disasters on data access and enhances overall data resilience.

Off-site tape backups may increase recoverability but create challenges with recovery time objectives (RTO) because of the increased time it takes to retrieve data from a separate location and restore it using tape technology. Hardware issues can happen often and unexpectedly. Cloud-based data storage and archiving has gained popularity because of higher availability and cost savings over traditional tape backups.

The cost and time required to operate multiple data centers and meet recovery times should also be considered in the requirements for your production and DR infrastructure. Never underestimate the risk to a successful recovery when facing time-consuming tasks like physical site recovery and data restoration from tape.

4. Leaving cloud-based productivity tools vulnerable

Cloud-based collaboration and communication tools like Google Drive and Microsoft 365 are commonly used by businesses and yet are often left vulnerable to data loss. Cloud services do not provide sufficient protection and recovery options that organizations likely need.

Businesses often find that the responsibility for backing up this data falls on their own IT, as these services typically operate under a shared responsibility model that doesn’t offer comprehensive backup solutions.

To ensure a reliable DR plan that includes cloud services, you should:

Evaluate granular recovery requirements for productivity platforms like Google Workspace and Microsoft 365.
Evaluate adherence to your long-term backup retention policy keeping in mind the regulations that your business might be subject to.
Determine if data stored in cloud platforms needs to be backed up with immutability due to cyber insurance requirements or other security policies.
Examine best practices for comprehensive, secure data protection for shared cloud drive services and SaaS productivity tools to address the lack of built-in recovery features.
Plan to store true backups of your SaaS data just as you would for any other data. It may seem redundant to back up cloud platforms to the public cloud, but doing so ensures that you have the right point-in-time backups you need and you can recover on your timeline—not Google or Microsoft’s.

Cloud costs will need to factor into decisions for where to store your data. Cloud storage costs should be included as a non-functional requirement to make sure you can achieve your secure recovery goals without sacrificing affordability.

Best practices for cloud-based disaster recovery

Many enterprises rely on cloud-based DR solutions to ensure uninterrupted operations, protect critical data, and maintain customer trust. Unlike traditional DR methods, cloud-based solutions offer scalability, cost-effectiveness, and rapid recovery capabilities. To truly leverage the potential of these systems, it’s important to be aware of some key strategies and considerations to optimize your cloud-based disaster recovery plan, ensuring resilience in the face of unexpected disruptions.

Consider diversifying your cloud portfolio: Using the same cloud service provider for your backups as for your production data may not be necessary, as you don’t need the same level of performance for backup data. You could consider a tiered recovery approach based on the criticality of your applications and data.
Investigate existing tools for cloud compatibility: Many on-premises data protection tools like Synology or QNAP NAS devices also support cloud targets for backup storage. It’s important to match the capabilities of your current backup vendors to your recovery requirements and cloud storage budgets.
Avoid paying for storage you’re not using: Carefully read the fine print when considering cloud storage costs. Hidden fees, minimum retention requirements, and complicated pricing tiers make accurate forecasting difficult and could leave you paying for unused storage just to reach certain discount tiers.
Balance your budget with RTO and RPO targets: Using cloud data storage for production, backups, and archive can lead to some price shock as your environment scales. And moving data to lower cost storage tiers or cold storage may achieve attractive price reductions, but it often comes at the cost of recovery speed and added complexity. Look for a cloud storage provider with transparent pricing that makes it easier to plan your costs.

Finally, you should weigh your cloud-based options to evaluate platform compatibility, ongoing costs, and whether your CSP locks you in or out of specific ecosystems due to high storage costs, data transfer costs, and proprietary features.

Leveraging cloud-based backup and archive infrastructure

Adopting cloud-based disaster recovery best practices is a key consideration for building a resilient and reliable business infrastructure. By developing a well-structured disaster recovery plan, determining the right mix of storage solutions, and optimizing costs with tiered recovery, businesses can minimize downtime and data loss during unexpected events. A proactive approach not only safeguards your organization’s operations but also strengthens customer trust and competitive advantage. In a world where disruptions are inevitable, being prepared is the key to bouncing back stronger and faster.

The post Disaster Recovery 101: Navigating Backup and Archive Infrastructure appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage

2024-12-12 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/effortlessly-managing-unfinished-large-file-uploads-with-b2-cloud-storage/

An illustration of a cloud with boxes representing data uploading to the cloud.

Digital clutter isn’t just inefficient, it can be costly. And if cleaning up digital clutter in your business operations is one of your New Year’s resolutions for 2025, this post is for you. We’re talking about managing unfinished large file uploads.

One big culprit of digital clutter when it comes to cloud storage is unfinished large files. Managing unfinished large file uploads can be a complex task. If they are not managed well, they can consume space and incur costs without any benefit.

To address this, we’ve introduced a feature in Backblaze B2 Cloud Storage that automatically cancels unfinished large file uploads, saving you both time and money.

The challenge: Unfinished large file uploads

To upload a large file, you break it into smaller parts. You initiate the start notification. Each part is uploaded in parallel, and once all parts are received, a finish notification is sent. Only after the final step does the file become consumable. Sometimes, things don’t go as planned—network hiccups, API timeouts, or user interruptions can leave large file uploads unfinished. The process then likely restarts and completes successfully, but this leaves you with both a complete file and a partially completed file in your cloud storage instance. These unfinished uploads still take up storage space, leading to unnecessary costs.

Previously, users had to manually track down and delete these unfinished uploads. It’s error prone and time-consuming, and not an easy task especially with a large volume of files.

The solution: Canceling unfinished uploads through lifecycle rules

To streamline the process, we’ve added a feature that allows users to automatically cancel these incomplete uploads after a set number of days. By setting lifecycle rules through the B2 Native API, users can now specify how many days an unfinished large file can remain before it’s automatically deleted.

For detailed guidance on configuring this rule, check out our Lifecycle Rules Documentation.

Why it matters

This feature is useful in a variety of scenarios:

Network failures: If a network interruption prevents the final completion step, the unfinished upload will no longer remain indefinitely. Instead, it will be automatically cleared after the defined period, ensuring you aren’t paying for useless storage.
User interruptions: If an upload is manually paused or forgotten before completion, lifecycle rules will take care of these fragments, preventing forgotten uploads from lingering in storage.
Script failures: If your script fails or times out during the upload process, any incomplete files won’t go unnoticed. They’ll be cleared as per your rules, ensuring efficient storage management.

Cost-saving benefits

Unfinished uploads can quickly add up, both in storage usage and costs. By automatically canceling incomplete uploads, users can significantly reduce unnecessary expenses, keeping storage budgets under control. This is especially important for businesses with large-scale data transfers, where managing storage efficiency can have a direct impact on the bottom line.

What’s next?

Most users configure lifecycle rules through the console or Backblaze B2 command line tool (CLI), so we introduced this feature for the B2 Native API to address immediate customer needs while also laying the groundwork for integrating it into the B2 Cloud Storage web console. You can now use this feature via the CLI or B2 Native API. We’re working on adding UI support to make configuration even more accessible. Let us know in the comments if you’re looking for access to this feature via a different user interface.

In the meantime, here are a few steps you can take:

Implement lifecycle rules: Set rules that fit your upload behavior. Choose a reasonable timeframe to cancel unfinished large file uploads that balances with your cost-management goals.
Test the feature: Try configuring the lifecycle rule for a few test uploads to make sure it behaves as expected. Monitor how it handles interruptions or failures to ensure it aligns with your needs.
Monitor storage costs: Check your storage usage and billing before and after setting these rules to understand the impact on costs. Use the feedback to fine-tune your settings.
Stay tuned for UI updates: Keep an eye out for announcements regarding UI support for this feature. We’re committed to making it as intuitive and accessible as possible.

By leveraging lifecycle rules for unfinished large file uploads, you can maintain a cleaner, more efficient storage environment while saving money. For more details on configuring lifecycle rules, visit our API documentation.

The post Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI 101: Building and Deploying an AI Model

2024-12-11 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-building-and-deploying-an-ai-model/

A decorative image showing a computer, a cloud, and a building.

Should you build your own AI model? Or use other services to help you accelerate the process?

Once you’ve defined the problem you’re trying to solve and the AI model type that best fits your needs, these are the questions you’re faced with next—where to deploy an AI model and how to go about doing it. In most cases, there is very little reason for you to build, train, and deploy your AI model from scratch, particularly as more and more vendors are stepping in to help companies with all or some of the process. It’s fundamentally complex, takes tons of resources and requires specialized knowledge to do correctly.

Still, you should have a basic understanding of the AI model training and deployment processes, as these learnings will be useful as later on as you explore various predefined tools, applications, and services you can use to expedite or enhance your ability to use AI within your organization. That’s what I’m digging into today.

How AI model training works

There are several steps in training an AI model which include identification and gathering the data required, data cleansing and assembly, training the model, checkpointing, and, finally, model serving where the model is deployed into the production environment. Here’s an overview of the process.

A diagram that explains the AI model training process.

Let’s take a minute to explore each of the steps in a little more detail.

Step 1: Review

The organizational data needed to help educate your model will either be structured or unstructured. Structured data is found in databases, tables, and so on. Unstructured data is basically everything else. Some unstructured data is easy to process, such as text files, while other data is harder to extract, such as PDFs and images.

In general, the more data you can provide, the better your trained model can be. But, remember to include data that is not what you want as well—this helps models to hone in on the specific piece of information when things are similar. Take this example scenario, for instance:

You are monitoring hundreds of thousands of wooded acres to determine if there is a fire on the land. As part of training the model, you need to provide images of the legitimate flora and fauna along with images of fire. But you should also provide images of what is not fire, for example reflections of the sun or moon on a lake, a group of lightning bugs at night, car headlights, and so on.

Step 2: Clean

As the data is collected, it will need to be pre-processed, which involves several techniques such as cleaning the data to handle missing values, removing outliers, scaling features, encoding categorical variables, and splitting the data into training and testing sets. The data needs to be arranged in a manner acceptable to the model itself. This sounds relatively simple, but some studies show that this can take up to 80% of the total model development process time.

Step 3: Stage

This is a collection point for all of the clean, ready to be processed, data. This data will arrive as it is processed (cleaned) which can occur over several days or even weeks. Having this data on hand will be useful if the model is not generated correctly or in the future as a starting point to retrain the model.

Typically large amounts of your data will be cleaned and staged as it is readied to train the AI model. But, there are no special storage requirements for this data. It just needs to be readily available to be uploaded to the AI training environment when the time comes.

Step 4: Train

Model training is a resource intensive process where data is copied from staging to high-performance storage located in close proximity to whatever high-powered processor you’re rocking, usually a graphical processing unit (GPU). The GPUs then run the algorithms developed specifically for training the model, and the data is iteratively read and processed an indeterminate number of times until training is complete. Minimizing the time spent utilizing these expensive, high-powered storage and processing resources is critical in managing the overall cost of building the model. In other words: get in, process, and get out.

Step 5: Checkpoint

During the building of the model, the programming will often create snapshots of the status of the training process. This will include various variables, state changes, and so on. These snapshots are referred to as checkpoints. They initially will be written to local storage within the model training system, and are used to restart the training process from a known good state if something goes wrong.

Once the model training process is complete, checkpoints should be written to the same centralized data storage location as your staged data. The checkpoint data will become part of the documentation of the model and may be used for forensic purposes should the model not behave appropriately once it is deployed.

Step 6: Serve

Once the training process is complete, the model can be exported to your central storage location. This will once again help document the system, and from there the model can then be uploaded to the local or cloud compute environment where it will be used.

At this point you have a clean version of the source data, the checkpoints of the model created, and a copy of the model itself, all stored in your centralized location under your control and readily available should they be needed in the future.

AI model inference

The term inference is derived from the AI model’s perspective. At a high level, when given a prompt, the model infers its response from the trained model and its data. In simple terms, you’ve trained your model to recognize cats, and then you bring it new data (a picture of a family reunion) and ask your model if it sees any cats in the photo (I’m hoping the answer is yes).

In AI, the prompt is viewed as new data which is compared to the model’s existing data to determine a response typically in the form of a decision, prediction, or new content as is the case with generative AI models.

An overview of the inference process is below:

In some AI systems, the inference process flow includes some additional code to help improve your model. These types of filters can have a range of uses and can happen on either the input or the output stage. For example, if you want to filter inappropriate queries or information, you could include something like keyword filtering when data (the prompt) is input. Or, you could introduce a toxicity detection filter on the output side, which reviews responses and prevents harmful or offensive content to be presented to the user.

A perhaps better understood problem that filters like this can address is how to get accurate and up-to-date information out of your queried response. On the input flow side of things, retrieval-augmented generation (RAG) directs a trained model to incorporate and weight more heavily information from trusted sources that the user designates. On the output side, you might add a hallucination prevention filter, which would stop the model from presenting false or misleading information.

More broadly, you’ll notice that both the prompt and response are saved. It is important to review this information on a periodic basis. This is especially true if the model is public facing, if you are using a model which can change over time such as a foundation model, or if you are using a model which utilizes RAG techniques to include new or external content.

In all of those examples, your model can drift as new information is introduced, and, as we noted above, getting the right information and cleaning it properly is likely the most time-intensive and important stage of this process. Not for nothing is the phrase “knowledge is power” a truism—in the age of AI, knowledge is power and good data is king.

The post AI 101: Building and Deploying an AI Model appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Seamless Data Migration with Custom Upload Timestamps

2024-12-05 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/seamless-data-migration-with-custom-upload-timestamps/

A decorative image showing two cubes, representing data, moving from cloud to cloud. There are clocks above each cube.

Migrating data to the cloud? Ensuring that original timestamps remain intact through a cloud migration can be a critical factor for successful data management at scale. Losing these timestamps can lead to operational challenges that hinder your ability to track data effectively, set proper lifecycle rules, create custom events, and more.

Backblaze B2 Cloud Storage now offers the Custom Upload Timestamps feature to help you manage your data. Today, I’m sharing details on the new feature, benefits, and how to enable it.

What are Custom Upload Timestamps?

The Custom Upload Timestamps feature is designed specifically to retain the original timestamps of your files during a migration. It is especially beneficial for users who rely on lifecycle rules to dictate file deletion or archiving based on age for compliance, to track file age manually, or maintain historical context of file.

Imagine this scenario: You have a critical file on another cloud storage provider, governed by a lifecycle rule that deletes it after 1,000 days. If you move the file to Backblaze B2 on day 999, the timestamp would be overwritten and you’d have to restart that lifecycle from day one. However, with this new feature, the original timestamp remains intact, and the file will still get deleted on day 1,000, just as planned. This capability not only simplifies the migration process, but also ensures continuity in your data retention policies, keeping your storage costs in line with expectations.

Benefits of Custom Upload Timestamps

Lifecycle rules play a crucial role in managing data retention, particularly when migrating large datasets. Losing the original timestamps means you’d have to manually reconfigure your rules or wait much longer for lifecycle events to take effect. The benefits of retaining original timestamps extend beyond just lifecycle rules.

Here is why this feature is essential:

Operational efficiency: Knowing the original timestamp of files allows for better organization and tracking. This is vital for businesses that rely on historical data to inform decisions or manage projects. When timestamps reset, it can lead to confusion and disarray in managing files. You may find yourself dealing with files that should have been deleted or archived but aren’t because of the reset timeline.
Compliance: For organizations that must adhere to regulatory standards for data retention, preserving timestamps can help meet legal requirements. It provides a clear audit trail and evidence of when files were created or modified.
Decreased workload: Manually tracking and reconfiguring lifecycle rules consumes valuable time and resources. By retaining the original timestamps, you eliminate unnecessary workloads.
File age tracking: Whether you’re managing backups, archival processes, or simple organizational tasks, knowing the age of a file can inform your decisions regarding when to review or delete files.
Historical context: For projects that span long periods, retaining timestamps helps maintain the context of data. This can be critical for collaborative efforts or projects that require consistent documentation.

Ultimately, the custom upload timestamps feature supports greater data portability, making it easier to move and manage large datasets. It ensures that migration to B2 Cloud Storage is as seamless as possible—without the need to reset or alter your data management policies.

Ready to get started?

The Custom Upload Timestamps feature is enabled by default for all B2 Cloud Storage customers. To utilize this feature, you need to include the X-Bz-Custom-Upload-Timestamp parameter when calling the b2_upload_file API. This simple addition allows you to retain the original timestamp of your file, thereby preserving its lifecycle state without interruptions and ensuring that your data remains organized and easy to track.

By retaining the original timestamps, Backblaze B2 helps increase the ease and granularity with which you can manage your data, especially for organizations migrating large volumes of data. You can transition your data while maintaining control over important metadata like the original timestamp, streamlining your operations, improving overall efficiency, and avoiding the stress of potential compliance issues.

What next?

To make the most of the Custom Upload Timestamp feature, consider the following actionable steps:

Review your migration workflow. Before starting the migration, ensure that your processes include the X-Bz-Custom-Upload-Timestamp parameter in your upload scripts or APIs. This will help prevent any disruption in tracking important metadata.
Test the feature. Conduct a pilot migration with a small number of files. This will allow you to confirm that the timestamps are retained correctly. Monitor the behavior of your data tracking after this test migration to ensure everything operates as expected.
Verify lifecycle rules. Once you complete the migration, take the time to check that your lifecycle policies continue to function as intended on B2 Cloud Storage. This verification step is crucial to avoid unexpected data retention issues.
Engage with Support. If you have any questions or encounter challenges, don’t hesitate to reach out to our Support team. We’re here to help you make the most of B2 Cloud Storage.

For more details, visit our API documentation to ensure you’re ready for a smooth migration. By leveraging the Custom Upload Timestamps feature, you can simplify your data management processes.

The post Seamless Data Migration with Custom Upload Timestamps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Troubleshooting Disaster Recovery Scenarios: 10 Mistakes to Avoid

2024-12-03 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/troubleshooting-disaster-recovery-scenarios-10-mistakes-to-avoid/

A decorative image showing a hammer smashing a drive.

When it comes to disaster recovery (DR), hope isn’t a plan. Yet I’ve seen the same story play out too many times: Companies find themselves scrambling when the unthinkable happens, discovering that their disaster recovery strategy is, well, full of holes. It’s like packing a parachute: You don’t want to find out what you missed when you’re already falling through the air. From my experience, there are some common mistakes businesses make that can turn a manageable problem into a fire drill.

In this post, I’m sharing the top 10 disaster recovery mistakes I’ve come across when helping businesses think through their disaster recovery posture so that you can strengthen your own safety net. By avoiding these mistakes and implementing a comprehensive DR plan, you can ensure a rapid and efficient recovery from unforeseen disruptions.

1. Proximity paradox

A geographically close disaster recovery site offers limited protection. A natural disaster impacting your primary location could easily disable the nearby DR facility as well. And, if you don’t have a DR site, this could still apply to your business if you keep your backups nearby, such as in a tape storage facility down the road.

How Pittsburg State solved the proximity paradox

Pittsburg State University is located in Kansas in the heart of tornado alley. Disaster planning is nonnegotiable, and the university didn’t want to take chances with their data. See how they set up a robust private cloud with nodes across the state and backed all of their data up to immutable cloud storage with Backblaze B2.

2. Untested backups

Backups that haven’t been restored and verified are unreliable. Regularly test your backups to ensure a smooth recovery process during a disaster.

3. Replication trap

Relying solely on replication for DR creates a single point of failure. If your primary site is compromised, the replicated data at the DR site might be compromised as well. Off-site full and incremental backups are essential.

4. Paper plan peril

A DR plan gathering dust on a shelf is useless. Conduct regular drills to simulate disaster scenarios and expose weaknesses in your plan.

5. Snapshot snafu

Snapshots are not comprehensive backups. Using snapshots for long term storage and retention introduces both technical and compliance risks in relation to how snapshots are managed. This affects both cloud and on-premises platforms.

6. SaaS surprises

Software as a service (SaaS) providers like Microsoft 365 and Google Workspace focus on high availability, but they operate on a shared responsibility model, meaning they may have limited built-in protection and recovery options. You may not be managing servers, but you do need a comprehensive data protection plan including regular, incremental backups outside of the SaaS platform.

7. Unforeseen force majeure

Disasters come in all shapes and sizes. Don’t limit your DR plan to common IT disruptions. Consider scenarios like widespread power outages or communication breakdowns, and plan accordingly. The goal is holistic cyber resilience—not only identifying threats and protecting against them, but also withstanding attacks as they’re happening and responding effectively.

8. Backup infiltration

Bad actors are increasingly targeting backups to increase the chances of a payout. Utilize immutable backups, unchangeable after creation, for an extra layer of protection against ransomware attacks.

9. Cloud drive disasters

Storing data on Google Drive, Dropbox, OneDrive, etc. is incredibly common. But these platforms do not protect against ransomware and provide limited point-in-time recovery options. Cloud drives are not a sufficient backup of your data.

10. Overlooking compliance

Factor in compliance needs when building your data protection and DR strategy. Regulations like HIPAA, GDPR, and others may have security or archival requirements that should be considered in your plan.

Invest in cyber resilience

After working in the disaster recovery space, I can tell you this: It’s not just about having a plan; it’s about having one that works when it counts. The mistakes I’ve covered here are common, but they’re also avoidable. Take the time to address these now, and you’re not only protecting your systems and data, but your company’s future. For me, a strong DR plan is an investment in resilience, and it’s there to catch you when you need it most.

The post Troubleshooting Disaster Recovery Scenarios: 10 Mistakes to Avoid appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

5 Ways to Use Event Notifications to Advance Your Media Better, Faster

2024-11-26 Jeremy Milk

Post Syndicated from Jeremy Milk original https://www.backblaze.com/blog/5-ways-to-use-event-notifications-to-advance-your-media-better-faster/

A decorative image showing a cloud with digital lines and media icons.

In the hurry-up-and-wait world of media production, anything you can do to speed through the hurry-ups and avoid or shorten the waits is not just a gift—it’s an advantage that can mean happier team members, delighted clients and fans, and more revenue.

Backblaze Event Notifications can help.This new B2 Cloud Storage feature can help you streamline a range of your production tasks—like automatically starting transcoding of video and distributing new images—across your preferred workflow tools.

Today, I’m sharing five ways you can use Backblaze Event Notifications to operationalize media production efficiencies. If you’re interested in Event Notifications for applications, check out this post; and stay tuned for a future post on how to use Event Notifications for IT backup.

Event Notifications for media production: Simplified automation

Event Notifications monitors your B2 Cloud Storage for data changes that you designate—think raw video uploads, content version updates, deletions, etc.—and delivers near real-time alerts where you want them about these changes. These alerts can be used to create awareness faster, and even more powerfully, to initiate streamlined end-to-end processes that can save you time and hassle, and avoid unnecessary manual tasks and/or the cost of complex intermediaries.

What are webhooks?

Webhooks, if you’re not familiar with the term, are HTTP-based callback functions that enable event-based communications between software applications. Backblaze Event Notifications can uniquely work with any external service that accepts webhooks. This means you can use it to your advantage across your media production workflow—and this is novel when most vendors’ alerts features are limited to closed ecosystems or require significant and sometimes costly workarounds to communicate beyond a limited set of production tools.

Top 5 use cases for media production

Here are specific, practical ways people producing and managing media can take advantage of Event Notifications for immediate benefits.

1. New content processing

Event Notifications can be used to trigger tasks immediately after new content is uploaded. Imagine one of your team members uploads a video or image: Event Notifications can be sent to a transcoding service to format it and a tagging service to categorize it for better content organization. Set up to furthermore extract valuable metadata too—all in near real time, without manual intervention.

General workflow (abbreviated)

By automating these processes, companies can ensure that user-generated content is formatted correctly, appropriately tagged, and moderated without delay. This not only saves time but guarantees a consistent user experience.

What’s more, you can even go full Jedi Knight and handle errors programmatically with Event Notifications logic that triggers reprocessing tasks whenever issues arise.

2. Integrated alerts in go-to tools

Event Notifications can easily integrate with your communication tools like Slack and productivity tools like Zapier, to inform internal and external stakeholders of updates without them needing to check for them manually. Users have told us this is a great way to keep people updated when assets are added, updated, or advanced to key stages in production and post cycles—setting them up to consider taking downstream actions that don’t lend themselves to further process automation.

Asset change announcement workflow

Additionally, for teams using workflow tools such as Zapier to connect various services, Event Notifications makes it simple to trigger actions across multiple platforms, enabling powerful, automated workflows with your data in B2 Cloud Storage.

3. Over-the-top (OTT) streaming automation

Regardless of whether your streaming model is AVOD, TVOD, or SVOD, Event Notifications can help automate processing and distribution workflows. Users can enable them so that every time a new title is added to B2 Cloud Storage, it then triggers alerts that initiate transcoding, compression, and prep for delivery or playback via content delivery network (CDN).

OTT streaming platform workflow

4. Backup completion monitoring

An important (if unglamorous) aspect of managing media is backing it up for extra safekeeping. After all, it’s a precious asset worth a lot of money now and later. So whether you back up nightly, monthly, at project’s end, or on some other cadence, with Event Notifications, customers can set up to receive updates when their media backups are successfully uploaded to a Backblaze B2 Bucket—providing peace of mind when data is protected.

We’ve also had a few users already tell us that not seeing backup completion alerts when expected helped them realize that they had other, previously unknown workflow hiccups to address.

Backup complete workflow

Tangentially related, media organizations are also using Backblaze Cloud Replication to programmatically store their content to two or more geographically distributed locations for added protection—this isn’t the same as Event Notifications, but is another automation tool for enhancing your protection posture.

5. Monitor data usage

Since Event Notifications messages are sent within seconds of files being uploaded and deleted, and they contain the size of the file in question, you can easily and reliably track your data usage in near real time, helping you identify trends and potential issues. For example, if you know large raw files are coming in and then messages indicating much smaller than expected file sizes were uploaded, it can alert you to begin to QC it.

We’ve also seen such data monitoring prove highly beneficial to IT personnel who support them because the near real-time monitoring allows faster responses to situations as they are happening, thereby mitigating risks, reducing costs, and/or nipping issues in the bud so the production teams remain disruption and distraction free.

Monitoring workflow

Beyond these example use cases, Event Notifications opens up a wide range of possibilities for automating and optimizing workflows. This flexibility makes it easy to automate how your infrastructure interacts with and reacts to file changes in B2 Cloud Storage, simplifying workflows across your distributed services. So go ahead and get creative—and please do share with us the cool things you’re doing with Event Notifications.

Why Event Notifications matter for production workflows

The benefits of real-time notifications extend beyond simply saving time—they transform the way teams work, automate processes, and reduce the margin for error.

Awareness: Instant notifications for uploads, updates, or deletions keep everyone on the same page.
Actionable insights: Real-time alerts provide critical information that helps make informed decisions quickly.
Flexibility: Direct connections to services like media asset managers (MAMs), transcoding applications, and CDNs mean more choice to stick with your preferred stack and less lock-in to specific vendors or tools.
Cost efficiency: Automating tasks like media transcoding, data processing, or content delivery reduces the need for manual labor, saving on operational costs and freeing up resources for other strategic initiatives.

Improved security: By instantly alerting teams to changes or unusual activity, Event Notifications help maintain data integrity and support proactive security measures.

How Event Notifications compares

Unlike other offerings like Amazon’s messaging services, which are limited to specific ecosystems, Backblaze Event Notifications integrates directly with any service that accepts webhooks, offering true flexibility and avoiding vendor lock-in.

Event Notifications is also designed for at-least-once delivery, ensuring critical notifications are not missed. This reliability is important for teams building workflows that require precision and a level of consistency their end users expect.

The pricing for Event Notifications is simple and transparent. Backblaze B2 Reserve customers enjoy unlimited free Event Notifications, while pay-as-you-go Backblaze B2 customers enjoy 2,500 calls per day free and then $0.004 per 10,000 transactions. This straightforward pricing applies no matter the service receiving the notification. This enables businesses to confidently scale their event-driven workflows, knowing exactly what to expect in terms of costs, regardless of the services they choose to integrate with.

Ready to add automation to your media tasks?

For existing customers working with a Backblaze account manager, Event Notifications is already enabled for you, and your account manager can assist with any questions. If you’re an existing customer not currently working with an account manager, please contact our Support team to request access to Event Notifications.

New customers can contact our Sales team to learn more about how Event Notifications can streamline workflows and how to get started.

Once Event Notifications are enabled, log in to your Backblaze B2 account, navigate to the Buckets page, and click on the Event Notifications section. From there, you can set up notification rules for the events you want to track or configure notifications using our API.

For detailed instructions and best practices, visit our Event Notifications documentation.

The post 5 Ways to Use Event Notifications to Advance Your Media Better, Faster appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

7 Ways to Use Event Notifications to Streamline Application Workflows

2024-11-14 Amrit Singh

Post Syndicated from Amrit Singh original https://www.backblaze.com/blog/7-ways-to-use-event-notifications-to-streamline-application-workflows/

A decorative image showing a cloud with an alert symbol.

Event-driven infrastructure is at the core of modern application development. It helps businesses streamline processes like transcoding user-uploaded video or processing images for tagging, kicks off downstream workflows immediately, and reduces complexity by automating multi-step processes across distributed services.

Today, I’m sharing seven ways you can use Backblaze Event Notifications to accelerate application workflows, automate processes, streamline operations, and scale revenue. If you’re interested in Event Notifications, but you’re not using it to run applications, stay tuned for future posts sharing use cases for media management and backup and archive.

Event Notifications for applications: Simplified automation

Event Notifications delivers near real-time alerts for changes in B2 Cloud Storage, simplifying workflows across the services that interact with your stored data. Teams can use Event Notifications to create end-to-end processes that scale efficiently and integrate directly with any external service that accepts webhooks. This means no more manual monitoring of storage or relying on complex intermediaries.

What are webhooks?

Webhooks, if you’re not familiar, are a way for applications to communicate with each other by sending data automatically based on specific events, e.g., HTTP POST requests with a JSON payload. Notably, our Event Notifications feature isn’t limited to a closed ecosystem or subset of business tools.

Automating common application tasks with Event Notifications allows you to reduce operational overhead by minimizing manual monitoring, accelerate processes across integrations with your preferred services, and reduce manual entry errors that can cost enterprises time and money.

Top 7 use cases for applications

Let’s explore some practical ways Event Notifications can be leveraged within your tech stack:

1. User-generated content processing

For applications dealing with user-uploaded content, Event Notifications can be used to trigger tasks immediately upon data upload. Imagine a user uploading a video or image: An Event Notification could be sent to a transcoding service to format it, a tagging service to categorize it, or even a moderation tool to ensure it complies with your community standards—all in near real time, without manual intervention.

Social platform workflow

2. Integrated alerts with automation tools

Event Notifications can easily integrate with productivity tools like Slack and Zapier, or any service that accepts a webhook, making it easy to provide team-wide awareness into changes in your storage environment without manual checks. This keeps teams informed and at the ready to be able to respond immediately to critical events.

Asset tracking and monitoring workflow

Additionally, for teams using workflow platforms such as Zapier to connect various services, Event Notifications makes it simple to trigger actions across multiple platforms, enabling powerful, automated workflows with your data in B2 Cloud Storage.

3. Surveillance and streaming automation

For applications managing large video files, such as surveillance or streaming platforms, Event Notifications can help automate the processing and distribution workflows. Videos can be transcoded, compressed, and prepared for delivery or playback promptly.

Streaming platform workflow

This automation is also useful for time-sensitive content, where quick turnaround is essential. Automating video processing reduces the manual effort involved and ensures content is always ready for viewing in the preferred format as soon as it’s available.

4. AI workload automation

For businesses building AI applications, Event Notifications can be used to trigger AI workloads in real time, enabling faster processing and response. For instance, when new data is uploaded, alerts can trigger downstream services to process that data, such as converting images to text or analyzing content for insights.

AI image to text workflow

In this case, this AI workflow ensures tasks start the moment data becomes available. Whether you’re running an image recognition service, analyzing datasets, or building AI models, Event Notifications eliminates the delays that come with manual processing. No matter what your downstream service is, Event Notifications provides the flexibility to integrate seamlessly with your AI workflows, improving real-time processing capabilities and enabling teams to focus on delivering better solutions rather than managing manual data flow.

5. Monitor data usage

Since Event Notifications messages are sent within seconds of files being uploaded and deleted, and contain the size of the file in question, you can easily and reliably track your data usage in near real time, helping you identify trends and potential issues.

Monitoring workflow

In contrast with periodic usage reports, near real-time monitoring allows you to respond to situations as they are happening, mitigating risks and potentially reducing costs.

6. Respond to security events

Event Notifications can feed near real-time data to security information and event management (SIEM) systems, allowing you to detect and respond to anomalous access patterns as they are happening.

Security alert workflow

Event Notifications allows you to take a proactive, rather than reactive, security posture, again mitigating risks and reducing costs.

7. Automatically trigger data integration

Event Notifications enable your data integration workloads to run within seconds of new data being uploaded to Backblaze B2, continuously delivering data to analytical systems and dashboards, giving you a live view of the state of your business.

Data integration workflow

Delivering data to dashboards within seconds or minutes of its availability enables near real-time insights, faster decision-making, and the ability to react to events as they occur.

Beyond these example use cases, Event Notifications opens up a wide range of possibilities for automating and optimizing workflows. You can use Event Notifications to automate metadata extraction and tagging for better content organization, and handle errors programmatically by triggering reprocessing tasks whenever issues arise. This flexibility makes it easy to automate how your infrastructure interacts with and reacts to data changes in B2 Cloud Storage, simplifying workflows across your distributed services.

Why Event Notifications matter for application workflows

The benefits of real-time notifications extend beyond simply saving time—they transform the way teams work, automate processes, and reduce the margin for error.

Awareness: Instant notifications for data changes, uploads, or deletions keep everyone on the same page.
Actionable insights: Whether it’s confirming a successful upload or catching an unexpected change, real-time alerts provide critical information that helps make informed decisions quickly.
Flexibility: Direct connections to services like transcoding, compute, or serverless applications mean more choice and less lock-in to specific vendors or tools.
Improved security: By instantly alerting teams to unauthorized changes or unusual activity, Event Notifications help maintain data integrity and support proactive security measures.
Cost efficiency: Automating tasks like media transcoding, data processing, or content delivery reduces the need for manual labor, saving on operational costs and freeing up resources for other strategic initiatives.

How Event Notifications compares

Unlike other offerings like Amazon’s messaging services, which are limited to specific ecosystems, Event Notifications integrates directly with any service that accepts webhooks, offering true flexibility and avoiding vendor lock-in.

The pricing for Event Notifications is simple and transparent, with 2,500 calls per day free, and just $0.004 per 10,000 transactions. This straightforward pricing applies no matter the service receiving the notification. This enables businesses to confidently scale their event-driven workflows, knowing exactly what to expect in terms of costs, regardless of the services they choose to integrate with.

Ready to add automation to your application?

New customers can contact our Sales team to learn more about how Event Notifications can streamline workflows and how to get started.

For detailed instructions and best practices, visit our Event Notifications documentation.

The post 7 Ways to Use Event Notifications to Streamline Application Workflows appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Drive Stats for Q3 2024

2024-11-12 Andy Klein

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-2024/

A decorative image that displays the words Q3 2024 Drive Stats.

As of the end of Q3 2024, Backblaze was monitoring 292,647 hard disk drives (HDDs) and solid state drives (SSDs) in our cloud storage servers located in our data centers around the world. We removed from this analysis 4,100 boot drives, consisting of 3,344 SSDs and 756 HDDs. This leaves us with 288,547 hard drives under management to review for this report. We’ll review the annualized failure rates (AFRs) for Q3 2024 and the lifetime AFRs of the qualifying drive models. Along the way, we’ll share our observations and insights on the data presented and, as always, we look forward to you doing the same in the comments section at the end of the post.

Hard drive failure rates for Q3 2024

For our Q3 2024 quarterly analysis, we remove the following from consideration: drive models which did not have at least 100 drives in service at the end of the quarter, drive models which did not accumulate 10,000 or more drive days during the quarter, and individual drives which exceeded their manufacturer’s temperature specification during their lifetime. The removed pool totalled 471 drives, leaving us with 288,076 drives grouped into 29 drive models for our Q3 2024 analysis.

The table below lists the AFRs and related data for these drive models. The table is sorted ascending by drive size then ascending by AFR within drive size.

Notes and observations on the Q3 2024 Drive Stats

Upward AFR. The quarter-to-quarter AFR continues to creep up rising from 1.71% in Q2 2024 to 1.89% in Q3 2024. The rise can’t be attributed to the aging 4TB drives, as our CVT drive migration system continues to replace these drives. As a consequence, the AFR for the remaining 4TB drives was 0.26% in Q3. The primary culprit is the collection of 8TB drives, which are now on average over seven years old. As a group, the AFR for the 8TB drives rose to 3.04% in Q3 2024, up from 2.31% in Q2. The CVT team is gearing up to begin the migration of 8TB drives over the next few months.
Yet another golden oldie is gone. You may have noticed that the 4TB Seagate drives (model: ST4000DM000) are missing from the table. All of the Backblaze Vaults containing these drives have been migrated, and as a consequence there are only two of these drives remaining, not enough to make the quarterly chart. You can read more about their demise in our recent Halloween post.
A new drive in town. In Q3, the 20TB Toshiba drives (model: MG10ACA20TE) arrived in force, populating three complete Backblaze Vaults of 1,200 drives each. Over the last few months our drive qualification team put the 20TB drive model through its paces and, having passed the test, they are now on the list of drive models we can deploy.
One zero. For the second quarter in a row, the 14TB Seagate (model: ST16000NM00J) drive model had zero failures. With only 185 drives in service, there is a lot of potential variability in the future, but for the moment, they are settling in quite well.
The nine year club. There are no data drives with 10 or more years of service, but there are 39 drives that are nine years or older. They are all 4TB HGST drives (model: HMS5C4040ALE640) spread across 31 different Storage Pods, in five different Backblaze Vaults and two different data centers. Will any of those drives make it to 10 years? Probably not, given that four of the five vaults have started their CVT migrations and will be gone by the end of the year. And, while the fifth vault is not scheduled for migration yet, it is just a matter of time before all of the 4TB drives we are using will be gone.

Reactive and proactive drive failures

In the Drive Stats dataset schema, there is a field named failure, which displays either a 1 for failure or a 0 for not failed. Over the years in various posts, we have stated that for our purposes drive failure is either reactive or proactive. Furthermore, we have suggested that failed drives fall basically evenly into these two categories. We’d like to put some data behind that 50/50 number, but first let’s start by defining our two categories of drive failure, reactive and proactive.

Reactive: A reactive failure is when any of the following conditions occur: the drive crashes and refuses to boot or spin up, the drive won’t respond to system commands, or the drive won’t stay operational.
Proactive: A proactive failure is generally anything not a reactive failure, and typically is when one or more indicators such as SMART stats, FSCK (file system) checks, etc., signal that the drive is having difficulty and drive failure is highly probable. Typically a multitude of indicators are present in drives declared as proactive failures.

A drive that is removed and replaced as either a proactive or reactive failure is considered a drive failure in Drive Stats unless we learn otherwise. For example, a drive is experiencing communications errors and command timeouts and is scheduled for a proactive drive replacement. During the replacement process, the data center tech realizes the drive does not appear to be fully seated. After gently securing the drive, further testing reveals no issues and the drive is no longer considered failed. At that point, the Drive Stats dataset is updated accordingly.

As noted above, the Drive Stats dataset includes the failure status (0 or 1) but not the type of failure (proactive or reactive). That’s a project for the future. To get a breakdown of different types of drives failure we have to interrogate the data center maintenance ticketing system used by each data center to record any maintenance activities on Storage Pods and related equipment. Historically, the drive failure data was not readily accessible, but a recent software upgrade now allows us access to this data for the first time. So in the spirit of Drive Stats, we’d like to share our drive failure types with you.

Drive failure type stats

Q3 2024 will be our starting point for any drive failure type stats we publish going forward. For consistency, we will use the same drive models listed in the Drive Stats quarterly report, in this case Q3 2024. For this period, there were 1,361 drive failures across 29 drive models.

We actually have been using the data center maintenance data for several years as each quarter we validate the failed drives reported by the Drive Stats system with the maintenance records. Only validated failed drives are used for the Drive Stats reports we publish quarterly and in the data we publish on our Drive Stats webpage.

The recent upgrades to the data center maintenance ticketing system have not only made the drive failure validation process easier, we can now easily join together the two sources. This gives us the ability to look at the drive failure data across several different attributes as shown in the tables below. We’ll start with the number of failed drives in each category and go from there. This will form our baseline data.

Reactive vs. proactive drive failures for Q3 2024

Observation period	Reactive failures	Proactive failures	Total failures	Reactive %	Proactive%
Q3 2024 failed drives	640	721	1,361	47.0%	53.0%

Reactive vs. proactive drive failures for Q3 2024

Manufacturer	Reactive failures	Proactive failures	Total failures	Reactive %	Proactive %
HGST	194	177	371	52.3%	47.7%
Seagate	258	272	530	48.7%	51.3%
Toshiba	124	221	345	35.9%	64.1%
WDC	64	51	115	55.7%	44.3%

Reactive vs. proactive drive failures by Backblaze data center

Backblaze data center	Reactive failures	Proactive failures	Total failures	Reactive %	Proactive %
AMS	36	77	113	31.9%	68.1%
IAD	50	92	142	35.2%	64.8%
PHX	179	201	380	47.1%	52.9%
SAC 0	151	148	299	50.5%	49.5%
SAC 2	224	203	427	52.5%	47.5%

Reactive vs. proactive drive failures by server type

Server type	Reactive failures	Proactive failures	Total failures	Reactive %	Proactive %
5.0 red Storage Pod (45 drives)	4	2	6	66.7%	33.3%
6.0 red Storage Pod (60 drives)	433	349	782	55.4%	44.6%
6.1 red Storage Pod (60 drives)	70	107	177	39.5%	60.5%
Dell Server (26 drives)	22	61	83	26.5%	73.5%
Supermicro Server (60 drives)	111	202	313	35.5%	64.5%

Obviously, there are many things we could analyze here, but for the moment we just want to establish a baseline. Next, we’ll collect additional data to see how consistent and reliable our data is over time. We’ll let you know what we find.

Learning more about proactive failures

One item of interest to us is the different reasons that cause a drive to be designated as a proactive failure. Today we record the reasons for the proactive designation at the time the drive is flagged for replacement, but currently multiple reasons are allowed for a given drive. This makes determining the primary reason difficult to determine. Of course, there may be no such thing as a primary reason, as it is often a combination of factors causing the problem. That analysis could be interesting as well. Regardless of the exact reason, such drives are in bad shape and replacing degraded drives to protect the data they store is our first priority.

Lifetime hard drive failure rates

As of the end of Q3 2024, we were tracking 288,547 operational hard drives. To be considered for the lifetime review, a drive model was required to have 500 or more drives as of the end of Q3 2024 and have over 100,000 accumulated drive days during their lifetime. When we removed those drive models which did not meet the lifetime criteria, we had 286,892 drives grouped into 25 models remaining for analysis as shown in the table below.

Downward lifetime AFR

In Q2 2024, the lifetime AFR for the drives listed was 1.47%. In Q3, the lifetime AFR went down to 1.31%, a significant decrease from one quarter to the next for the lifetime AFR. This decrease is also contrary to the increasing quarterly AFR increase over the same period. At first blush, that doesn’t make much sense as an increasing quarter-to-quarter AFR should increase the lifetime AFR. There are two related factors which explain this seemingly contradictory data. Let’s take a look.

We’ll start with the table below which summarizes the differences between the Q2 and Q3 lifetime stats.

Period	Drive count	Drive days	Drive failures	Lifetime AFR
Q2 2024	283,065	469,219,469	18,949	1.47%
Q3 2024	286,892	398,476,931	14,308	1.31%

To create the dataset for the lifetime AFR tables two criteria are applied: first, at the end of a given quarter, the number of drives of a drive model must be greater than 500, and, second, the number of drive days must be greater than 100,000. The first criterion ensures that the drive models are relevant to the data presented; that is, we have a significant number of each of the included drive models. The second standard ensures that the drive models listed in the lifetime AFR table have a sufficient number of data points; that is, they have enough drive days to be significant.

As we can see in the table above, while the number of drives went up from Q2 to Q3, the number of drive days and the number of drive failures went down significantly. This is explained by comparing the drive models listed in the Q2 lifetime table versus the Q3 lifetime table. Let’s summarize.

Added: In Q3, we added the 20TB Toshiba drive model (MG10ACA20TE). In Q2, there were only two of these drives in service.
Removed: In Q3, we removed the 4TB Seagate drive model (ST4000DM000) as there were only two drives remaining as of the end of Q3, well below the criteria of 500 drives.

When we removed the 4TB Seagate drives we also removed 80,400,065 lifetime drive days and 5,789 lifetime drive failures from the Q3 lifetime AFR computations. If the 4TB Seagate drive model data (drive days and drive failures) was included in the Q3 Lifetime stats, the AFR would have been 1.50%.

Why not include the 4TB Seagate data? In other words, why have a drive count criteria at all? Shouldn’t we compute lifetime AFR using all of the drive models we have ever used which accumulated over 100,000 drive days in a lifetime? If we did things that way, the list of drive models used to compute the lifetime AFR would now include drive models we stopped using years ago and would include nearly 100 different drive models. As a result, a majority of the drive models used to compute the lifetime AFR would be outdated and the lifetime AFR table would contain rows of basically useless data that has no current or future value. In short, having drive count as one of the criteria in computing lifetime AFR keeps the table relevant and approachable.

The Hard Drive Stats data

It has now been over 11 years since we began recording, storing, and reporting the operational statistics of the HDDs and SSDs we use to store data at Backblaze. We look at the telemetry data of the drives, including their SMART stats and other health related attributes. We do not read or otherwise examine the actual customer data stored.

Over the years, we have analyzed the data we have gathered and published our findings and insights from our analyses. For transparency, we also publish the data itself, known as the Drive Stats dataset. This dataset is open source and can be downloaded from our Drive Stats webpage.

You can download and use the Drive Stats dataset for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, 3) you may sell derivative works based on the data, but 4) you can not sell this data to anyone; it is free.

Good luck, and let us know if you find anything interesting.

The post Backblaze Drive Stats for Q3 2024 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Solving the AI Training Data Challenge with Decart AI and Backblaze

2024-11-04 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/solving-the-ai-training-data-challenge-with-decart-ai-and-backblaze/

A decorative image showing the logos of Backblaze and Decart.

Depending on which LLM you ask, we live in a world with somewhere between 25k and 80k AI startups. It’s a growing, highly competitive market where small startups with a big idea can find themselves toe-to-toe with the goliaths of tech—fighting for money, chips, talent, even raw electrical power.

How does any company differentiate themselves in an explosive burst of technological change, one that requires a lot of investment in talent and infrastructure, where even the richest tech platforms on the planet don’t always succeed? Today we’re sharing the story of Decart—an AI startup that used Backblaze B2 Cloud Storage to leverage a successful launch with an impressive new model that provides an order of magnitude improvement in both the training and inferencing of the largest generative models.

Backblaze is an amazing solution for AI training data. We looked at a number of options and Backblaze is seriously the best.

—Dean Leitersdorf, Co-Founder and CEO, Decart

First, the news

Decart is an AI research lab that came out of stealth on October 31 with an incredible new model:

1/ We are excited to introduce Oasis, the world's first real-time AI world model, developed in collaboration with @Etched. Imagine a video game entirely generated by AI, or a video you can interact with—constantly rendered at 20 fps, in real-time, with zero latency pic.twitter.com/WAJFRyfTzS

— Decart (@DecartAI) October 31, 2024

While this might look like Minecraft, every pixel you see here and all of the gameplay is being generated by Decart’s Oasis model. It’s like Minecraft in every way you’d expect, except that the entire experience is being generated by AI and you can creatively prompt the model to build beyond the confines of the game. The mindblowing part? Decart says Oasis can perform more than 10 times more efficiently than competitors such as OpenAI’s Sora, which hasn’t been publicly released.

Don’t let the game distract you though—the Minecraft simulation is just an expression of the power of their model. According to the Decart team, this isn’t even version 1.0 of what their approach is capable of generating—more like version 0.01. Given the broad coverage they’ve already received for their launch, we’re excited to see what’s next.

How to break out in the AI market

For Decart, the strategy to pull ahead of the crowd was simple: Disrupt the market on inference speed to deliver game changing models, and do that by building the most high-performance multi-cloud model training infrastructure possible. Then, iterate on that innovation.

We crafted state of the art infrastructure that allows us to train models that other people simply can’t train.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Before we met Dean and the team at Decart, most of the hard work was done: the multi-cloud AI stack for training was dialed in and the models were going through the paces. They just had one simple, but big, problem holding them back:

The price and the logistics of moving and storing training data were going to limit their growth.

They were burning through free data storage credits from a traditional cloud provider and had data spread across a range of other cloud providers and GPU clusters. Their training data needed to scale from 100s of thousands of hours of video data to 100s of millions of hours, and they needed a storage solution that could handle that scale in three key areas:

Reliably high performance: Decart needed to know that when they got time on a cluster, they could move data in as fast as possible the second that they were able to.
GPU interoperability: They needed to be sure that whatever storage platform they chose, it would work well with a multi-cluster training approach. Being able to shop jobs between different GPU clouds and disperse training was essential for Dean’s team.
Efficiency: Every dollar an AI startup spends on anything other than training time is a competitive disadvantage, so ensuring that storage costs were low without any surprise fees for data retention or download was key.

Decart discovered Backblaze while researching storage alternatives. After a quick call and two fast months of testing Backblaze in a wide variety of usage patterns, it was clear to the team that they had found the storage foundation they needed.

We chose Backblaze because everything works. It’s super stable, and we had zero problems. That’s number one.

—Dean Leitersdorf, Co-Founder and CEO, Decart

When it came time to start moving data from Backblaze to GPU clusters, they had no problem with transferring petabyte-scale datasets. The only minor challenge was ensuring that the compute provider’s pipe could take the volume of data streaming in.

Here’s where things ended up working for Decart:

Performance: They were blown away by the performance they achieved with Backblaze (more to come on that later).
Price: With pricing at one-fifth the cost of traditional cloud providers, Backblaze unlocked a significant amount of budget.
Free egress: The true game changer. Decart, for a number of reasons, trains their models on multiple different GPU clusters at the same time. With Backblaze, they can egress their full dataset to up to three training sites with zero additional cost.

B2 Cloud Storage was literally the only technical thing we used in training these models that didn’t crash the first time we tried it. We’re in an industry where everything fails, but Backblaze didn’t.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Looking forward

With performance, flexibility, and affordability squared away in their data storage approach, the Decart team is now in position to rotate out of this impressive first model and build whatever is next. With all the fundamentals working on the level that Backblaze always provides and Decart is happy with, the two teams are now working together to find even more efficiency and optimization and truly stand up the best infrastructure for training AI models.

The post Solving the AI Training Data Challenge with Decart AI and Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives

2024-10-31 Andy Klein

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/quoth-the-drive-stats-nevermore-an-elegy-for-our-seagate-4tb-drives/

A decorative image showing a gravestone with ravens around it.

Once upon a midnight dreary, as I typed another query

Seeking many a quaint and curious fact of hidden Drive Stats lore—

While I waited, time advancing, suddenly the stats came dancing

Lines of empty datasets; the database had nothing more

“Is that right?” I muttered, “The database had nothing more—

So those drives, I must explore.”

Ah, distinctly I remember, it was just past this September

I requested failure rates of Seagate drives with terabytes of four

Eagerly I typed the query, even though my eyes were bleary

The count of Seagate fours was eerie, eerie; there was nothing more.

The sad and certain count screamed like it never had before;

No Seagate drives with terabytes of four.

There are missing rows, I’m certain, and files waiting to explore.

The reality I kept dismissing, the Seagate data must be missing

With hours gone to data fishing, the facts shook me to the core;

The spinning life is over for our Seagate drives with terabytes of four—

Those Seagate drives are nevermore.

(My apologies to Edgar Allen Poe.)

Shortly, we will publish the Q3 2024 Backblaze Drive Stats report, and an old faithful will be missing from the tables, the 4TB Seagate drive model ST4000DM000. This drive model has graced our Drive Stats charts and tables since the very first Drive Stats report, and it would be a ghastly mistake if we let the drive slip into the afterlife unnoticed. So on this All Hallows’ Eve, it’s only fitting we say nevermore to these Seagate drives.

The first 45 of these Seagate 4TB drives were installed in a 45-drive Backblaze Storage Pod in May 2013. That was before 60-drive Storage Pods, Backblaze Vaults, and even Backblaze B2. Over the next two years, thousands of new Seagate 4TB drives were added each quarter, and by Q3 2016, there were 34,744 spinning souls in service. That represented more than 50% of all the drives in service at the time—a howling success that has not been duplicated by any other drive model.

Alas, that didn’t last as the first wave of 8TB drives arrived in mid-2016 and with that, no additional 4TB Seagate drives were procured. Over time, as 4TB Seagate drives met their maker, the count decreased, and when Storage Pods containing these drives started being phased out in 2018, the count dropped faster. The final nail in the coffin came when, in 2023, our CVT drive migration system became fixated on the replacement of all the remaining 4TB Seagate drives, and here we are.

As for those intrepid 45 original drives installed in May 2013, they were not around at the end. They were unceremoniously replaced in a Storage Pod upgrade back in 2017. A few were resurrected as drive replacements, but today they only exist in the spirit world, having died or been replaced by 2020. Still many other 4TB Seagate drives have lived long happy lives, with nearly 100 exceeding 100 months of service (8.4 years) before being sent to their final resting place by the CVT reaper.

And so it is time; we shall gather in a circle, cross our arms and hold hands and chant “our Seagate drives…with terabytes of four…are nevermore!”

The post Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze Partners with Opti9 and Adds Canadian Data Region

2024-10-30 Teresa Dodson

Post Syndicated from Teresa Dodson original https://www.backblaze.com/blog/backblaze-partners-with-opti9-and-adds-canadian-data-region/

A decorative image showing the Backblaze and Opti9 logos.

Backblaze and Opti9 are partnering up to bring Backblaze B2 Cloud Storage to joint customers around the world as well as businesses in Canada who are required to keep their data within national borders.

The who and the why

Opti9 is the international leader in hybrid cloud solutions that delivers managed cloud services, application development and modernization, backup and disaster recovery, security, and compliance solutions to businesses around the world. By bringing Backblaze into their solution set, Opti9 is onboarding high performance, low cost cloud storage that works within all the solutions they provide.

Increasingly, companies seeking managed services support are demanding solutions made up of best-in-breed providers. While traditional cloud platforms work against this principle of interoperability, Backblaze and solution providers like Opti9 are committed to delivering cloud solutions without the limitations, complexity, and high pricing that are holding businesses back.

As Jim Stechyson, the President of Opti9 put it:

Backblaze and Opti9 focus on empowering businesses with the best cloud solutions available. Being able to integrate the high performance and low total cost of ownership of Backblaze’s object storage into our set of solutions will greatly enhance our ability to drive success for our customers.

How to get started

Interested resellers or customers who want to start working with Opti9 and Backblaze today can go to the Opti9 website. Check out our joint S3 compatible hot storage offering and book your demo to get started.

For customers based in Canada, Backblaze will be opening a new data region centered in Toronto in the first quarter of 2025. As part of the partnership, Opti9 will be the exclusive provider of server backup solutions in the Canadian channel for Backblaze B2 Reserve and the Powered by Backblaze program.

More about the new Backblaze data region in Canada

The new Canadian data region gives businesses the freedom to access Backblaze’s open, interoperable cloud solution, while still allowing customers to benefit from local storage and compliance. Located in Toronto, Ontario, the data center has been assessed and maintains a security program that addresses the requirements of SOC 1 Type 2, SOC 2 Type 2, ISO 27001, PCI DSS, and HIPAA. The region will be available to customers in the first quarter of 2025.

If you’d like to receive notifications about the data region opening date and when you can start storing data in Canada, you can sign up for the waitlist today.

The post Backblaze Partners with Opti9 and Adds Canadian Data Region appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

SaaS and the Shared Responsibility Model: A Guide to Protecting Your Data

2024-10-24 Vinodh Subramanian

Post Syndicated from Vinodh Subramanian original https://www.backblaze.com/blog/saas-and-the-shared-responsibility-model-a-guide-to-protecting-your-data/

A decorative image showing a Venn diagram with a cloud overtop it.

Were you the person who stayed up until 2 a.m. to finish the group project? If you, like me, burned the midnight oil to save the team from utter failure, you suffered from a breakdown of shared responsibility. No one knew who was supposed to do what.

The same breakdown applies when you don’t fully understand the “shared responsibility model” that most software as a service (SaaS) platforms use when it comes to your data. You might assume that, because it’s in the cloud, your SaaS data is protected automatically. In reality, SaaS companies are only responsible for maintaining their uptime, not for retaining your files and critical information in case you need to get back online—and this has big implications for how you protect your data, ensure compliance, and optimize system performance.

Today, I’m diving into what this model means and how it affects how you use SaaS platforms.

What is the shared responsibility model?

The shared responsibility model defines the division of duties between a SaaS provider and its customers. It delineates which aspects of the system the provider manages and what tasks remain under the customer’s control. The primary goal is to clarify roles and reduce any ambiguity about who is responsible for certain aspects of security, data integrity, and system maintenance.

Defined roles, reduced ambiguity. That all sounds great to me, but what do SaaS providers actually take responsibility for? And what are you responsible for?

SaaS provider responsibilities

First and foremost, SaaS providers are responsible for ensuring that the application and its underlying infrastructure (servers, networking, data centers) are secure. This includes physical security, network protection, patching the platform, and overall system integrity. They typically guarantee a certain level of service availability, often formalized in a service level agreement (SLA). Downtime, system performance, and platform updates fall within the vendor’s scope.

Practically speaking, that means that they may not back up your data as often as you would like or archive it for as long as you need. SaaS vendors do not concern themselves with fully protecting your files. Most importantly, they may not offer a timely recovery option if you lose the data, which is critical to getting your business back online in the event of an outage.

Customer responsibilities

SaaS providers and cloud drives typically take responsibility for the security “of” the cloud, including the infrastructure that runs all of the services offered in the cloud. On the other hand, the customers are responsible for security “in” the cloud. This means customers must manage the security of their own data.

What’s the difference? Let’s use an example I’ve come across many times. If a user inadvertently uploads a ransomware-infected file to a cloud drive like Google Drive or OneDrive, the service might protect the integrity of the cloud infrastructure, ensuring the malware doesn’t spread to other users. However, the responsibility to prevent the upload of the infected file in the first place, and managing its consequences, falls directly on the user. In essence, while cloud drives provide a platform for storing your data, relying solely on them without understanding the nuances of the shared responsibility model could leave gaps in your data protection strategy.

Customer responsibilities include, among others:

Data protection: While the provider secures the infrastructure, you are responsible for securing the data you upload, manage, and store within the platform. SaaS platforms may replicate data and have redundancy safeguards in place to ensure you can access your data through the platform reliably, but they do not assume responsibility for their users’ data. It’s up to you to ensure your data is backed up according to your needs and policies.
Access management: You are responsible for controlling who has access to the SaaS environment. This involves creating strong user authentication processes, managing roles and permissions, and ensuring that the right people have access to the right information.
Compliance: Even if the SaaS vendor is compliant with say HIPAA or GDPR standards, you are also responsible for ensuring that you’re using the platform in accordance with those standards.

Here’s a graph that shows how shared responsibility breaks down for Microsoft 365 as just one example:

A diagram of the shared responsibility model.

When the shared responsibility model matters

Unfortunately, I’ve found the shared responsibility model can create a false sense of security because understanding your responsibilities as a customer is often a process of elimination. SaaS responsibilities may be hard to track down, and when you can find them, they won’t say “you need to handle backups.” They’ll list what the provider handles, and all the rest is up to you.

When does this become important for you?

Security breaches: Many security incidents occur because of a misunderstanding of this model. For example, if a company assumes their SaaS provider is responsible for data encryption and user access control when, in fact, the company is, this can lead to critical vulnerabilities. A lack of clarity can expose businesses to data breaches, financial losses, and reputational damage.
Compliance issues: Regulatory compliance is another area that hinges on understanding shared responsibilities. Organizations that fail to implement required security measures or back up data properly can face fines, penalties, or legal consequences—even if the SaaS provider adheres to all necessary certifications.
Operational efficiency: Knowing where your responsibility starts and ends helps optimize how you use the platform. You can improve operational efficiency by focusing on the areas you control.

And, this gets even more complicated the larger your business and the more complicated your processes. So, if you have a business running on Google Workspace or M365, you can take something like emails and understand that Google is responsible for the email platform, but you should backup the individual emails themselves. But what about when you’re a media management company using best-of-breed tools for editing and collaboration, transcoding, asset management, and maybe even content delivery? All of those platforms have some responsibility in a shared responsibility model, and your job as a business is to understand where you are vulnerable—and then plug the gaps.

Navigating the shared responsibility model

So, what should you do with all of this information? In my experience, these are the biggest takeaways businesses can put into practice to successfully navigate the shared responsibility model:

Know your provider’s SLAs and security measures. Before adopting a SaaS solution, ensure you have a clear understanding of the vendor’s SLA and their security protocols. Understand the terms of their compliance with data privacy regulations, system availability, and disaster recovery.

What are Backblaze’s security and compliance protocols?

…is a question that would absolutely make sense for you to be asking. And I’m glad you did. Check out our Security and Compliance pages to learn more.

Educate your teams. Make sure that your internal teams are aware of their responsibilities in the shared model. Provide training on access control, data management, and security best practices to prevent accidental data exposure or misconfigurations.
Monitor and audit your usage. Set up regular audits to ensure that your organization is meeting its obligations under the shared responsibility model. Use tools to monitor access, detect unusual activity, and ensure data is being properly managed.
Make sure your backups are comprehensive. If you’re here, you’re probably well aware of this, but I can’t stress enough how important it is to back up your data, including data stored in cloud services like Microsoft 365, Google Drive, and OneDrive. Even if these services offer backups as part of the service, they may not meet your recovery needs.

How to approach the shared responsibility model

All this to say, you are ultimately responsible for backing up your data and files stored in SaaS clouds or cloud drives. The bottom line is that SaaS platforms’ top priority is to keep their own services running. By clearly understanding your role and responsibilities in this model, you can not only protect your data and ensure compliance, but also maximize the value of your SaaS investments.

The post SaaS and the Shared Responsibility Model: A Guide to Protecting Your Data appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI 101: Classification vs. Predictive vs. Generative AI

2024-10-22 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-classification-vs-predictive-vs-generative-ai/

A decorative image showing several buildings with digital lines flowing upward into a cloud.

It may seem like generative AI is the only game in town, or at least the only AI model worth paying attention to. But folks have been using AI models to do all kinds of things for years before ChatGPT, Claude, and Gemini came on the scene.

Today, I’m talking about the three different broadly defined categories of AI—classification, predictive, and generative—and what they’re good for.

Classification vs. Predictive vs. Generative AI Models: What’s the Diff?

Classification and predictive models have been foundational to AI for decades, powering applications like spam filters, cyber security tools, big data analysis, and demand forecasting. However, with recent advances, generative models like GPT and DALL-E have taken the spotlight, bringing up interesting existential (and legal) questions about the nature of creativity and creative work going forward. Understanding the distinctions and history of these models is key to grasping how AI continues to shape industries and innovation today.

Let’s see which category best applies to your particular problem.

AI classification models

A classification model is built to recognize, understand, and group data into preset categories. The model is fully trained using the training data and then evaluated using test data before being used to respond to unseen data. In general, such models infer answers for the current moment in time, for example, deciding whether an email is spam or phishing. In that case, the decision is based on comparing the incoming email to a model trained on previously classified email messages, both ones that the user has set or ones that the platform has. (The two are related, of course, as the platform’s filters often update to include aggregate user data.)

In business, classification models drive applications like spam detection, customer segmentation, and fraud detection. Healthcare uses classification models to diagnose diseases based on medical images or patient data. In finance, they help identify high-risk transactions. Social media platforms rely on these models to filter content, detect hate speech, and recommend posts. Overall, classification models are key to organizing large datasets efficiently and making decisions based on patterns, helping automate and optimize numerous industry processes.

AI prediction models

Predictive AI models utilize historical data, patterns, and trends to train the model, so they can be used to make informed decisions about future events or outcomes. Using Drive Stats as an example, we could theoretically build a model that, when given data about a particular drive model and failure rates, predicts the chance that a given hard drive will fail in the next 90 days. Predictive AI models typically require large amounts of data to be trained and are computationally expensive to generate.

Predicting Hard Drive Failure Rates with AI

Okay, we were being coy when we said “example.” Check out Andy Klein’s Tech Day 2024 presentation, “Predicting Hard Drive Failure Rates with AI” to see how this kind of predictive model works.

AI prediction models help predict customer behavior, sales trends, and demand, aiding in decision making and resource planning. In finance, these models are crucial for stock price forecasting, risk assessment, and credit scoring. Healthcare utilizes prediction models for patient outcome predictions, disease progression, and treatment effectiveness. They are also applied in weather forecasting, supply chain optimization, and energy usage management. By analyzing past data, prediction models provide insights that help organizations anticipate trends, make proactive decisions, and optimize performance across various industries.

Generative AI models

You know this one. Generative AI is about creating (sort of) new content. It uses neural networking, deep learning, and other techniques to infer and generate content that is based on patterns it observes in existing content all while mimicking the style and structure as requested. Image generators such as DALL-E and Stable Diffusion, and large language models like ChatGPT, Claude, and Gemini are easily accessible AI applications which have brought AI into the public eye.

Generative AI is at turns the thing that will revolutionize everything, a scary specter with near-sentience that will steal your job, or a big hallucinating fluke that tells you to put glue on pizza. There are some pretty cool use cases—for one, researchers are using generative AI for new drug discovery. But you’re most likely to run into generative AI in the following use cases: customer service chatbots, coding assistants, marketing support, and general business assistants that generate transcripts and summaries.

Unlocking the power of AI

Even with all the current hype around generative AI we are still in the early stages of development when it comes to AI systems given they are most useful in responding to queries based on the subject matter with which they were trained.

For example, an AI model trained to play chess might find playing checkers to be difficult. While the board, and number of players are the same, can a chess-playing AI model infer the allowed checker moves based on its understanding of chess? Even generative AI models like ChatGPT which are trained on a wide variety of subjects are still lacking a key ingredient to be truly useful to your organization: your data.

An AI chatbot, for example, isn’t going to perform the way you want it to without being powered by your organization’s data. And, how do you build an AI powered tool while keeping your private data private? We started to explore that very question in a recent webinar, “Leveraging your Cloud Storage Data in AI/ML Apps and Services.”

Tune in to learn more about the various ways AI/ML applications use and store data and get insights from our customers who leverage Backblaze B2 Cloud Object Storage for their AI/ML needs.

The post AI 101: Classification vs. Predictive vs. Generative AI appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing Scalable Application Keys for Enhanced Security and Performance

2024-10-15 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/introducing-scalable-application-keys-for-enhanced-security-and-performance/

A decorative image showing keys erupting from a server.

If you work in an industry with high performance and security demands like video surveillance, internet of things (IoT), and mobile applications, a new Backblaze capability could help your cloud workflows—Scalable Application Keys. This new capability enables you to generate application keys for your Backblaze B2 Cloud Storage Buckets at 150 times the current scale. When you need high volume, short-lived application keys to upload your data to cloud storage, you can’t tolerate bottlenecks—the Scalable Application Keys feature removes them.

Today, I’m digging into the challenges solved by Scalable Application Keys, the use cases where it has the most impact, and the benefits of the feature.

Jump to the docs

Check out our documentation for more information on Scalable Application Keys and how to work with them.

The challenges

Managing large volumes of data under strict security requirements sometimes requires using application keys programmatically to interact with cloud storage buckets. In the video surveillance industry, for example, you might upload massive amounts of video footage directly from security cameras to Backblaze B2, and you need to regularly refresh the application keys used by each of those cameras to maintain a robust security posture. However, this practice has a few unique requirements:

You can’t be hampered by rate limitations to generate the volume of application keys you need.
You need high key limits sufficient for the total volume of keys.
Throughput needs to be high to generate hundreds or thousands of new application keys simultaneously.

The irregular timing of key refreshes and the scale of operations can further amplify the problem, especially when hundreds or thousands of devices request new keys at the same time.

The solution: Scalable Application Keys

With the introduction of Scalable Application Keys, Backblaze B2 customers can generate and refresh keys at significantly higher volumes and throughput—without hitting hard limits on the number of keys. This feature is designed to accommodate the unique requirements of customers who need:

High key throughput: Create keys at scale, even when thousands of devices need new keys simultaneously. You can create up to 10,000 keys per minute.
Unlimited key generation: Scale without interruption—there’s no hard cap on the number of application keys that can be generated.
Short-lived keys: Easily generate keys with very short lifespans, enhancing security without compromising functionality.
S3 compatibility: Maintain support for the Backblaze S3 compatible API, allowing you to avoid costly firmware upgrades on your devices.

Real-world applications

This feature is particularly beneficial for customers with many endpoints that all upload to Backblaze B2 cloud individually through the S3 Compatible API. For example:

Video surveillance: Companies with large networks of security cameras can now easily refresh keys for each device frequently. When you operate tens of thousands of cameras that record sensitive footage, you need to be able to refresh application keys regularly to maintain security. With Scalable Application Keys, you can handle refreshes efficiently and continue scaling the number of cameras in operation without worry.

Mobile applications: Developers of mobile apps that store data in B2 Cloud Storage can generate unique keys for each user’s device. This is especially useful for apps that rely on user-generated and user-uploaded content, where each end device needs its own application key.

IoT devices: Businesses managing large fleets of IoT devices, where each device needs a unique and regularly refreshed application key, can ensure secure, individualized access to cloud storage.

Benefits of Scalable Application Keys

Enhanced security:
- Frequent key rotation becomes feasible at scale, significantly reducing the risk of unauthorized access.
- Short-lived keys minimize the window of vulnerability if a key is compromised.
- Customers can implement best practices for key management without extraneous components that could cause performance penalties.
Operational flexibility:
- Easily manage key generation for large numbers of devices, from thousands to millions.
- Accommodate sudden spikes in key requests, such as during system-wide updates or resets.
- Adapt to varying usage patterns throughout the day without hitting rate limits.
Cost effectiveness:
- Avoid expensive firmware upgrades by continuing to use the S3 Compatible API.
- Eliminate the need for complex workarounds or additional infrastructure to handle key management.

Scalable Application Keys not only solves existing limitations but also future-proofs your workflows by providing the flexibility and performance needed to scale without restriction.This feature allows you to securely manage access to B2 Cloud Storage, no matter the scale of your operations.

Ready to get started?

This feature is available upon request. If you’re an existing Backblaze B2 customer and want to get access to this capability, please contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn more about how Scalable Application Keys can benefit your business and how to get started.

Once this feature is enabled, you can generate application keys at scale. Check out the documentation for more on how to use the feature.

What next?

Are you leveraging Scalable Application Keys to build more efficient and performant workflows? Share how it’s working for you so other organizations and developers can benefit from what you find. If you have any questions or feedback, please don’t hesitate to reach out to us.

The post Introducing Scalable Application Keys for Enhanced Security and Performance appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Is AI Right for Your Business? 4 Questions to Ask

2024-10-10 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/is-ai-right-for-your-business-4-questions-to-ask/

A decorative image showing several layers of computer screen folding into the cloud.

AI is everywhere—powering chatbots, generating images, even deciding what you binge watch next. It’s no wonder businesses of all sizes are feeling compelled to jump on the AI bandwagon. But before you get swept up in the AI hype, here’s the question you need to ask: Is AI right for your business and the problem you’re trying to solve?

Where AI truly becomes a change agent is when it is powered by your organization’s data to deliver relevant, insightful, and actionable observations to you in a timely manner. The reality is, while AI is really cool, without your unique data it provides your organization few competitive advantages. Of course, releasing proprietary, or even sensitive, information to a robot connected to the internet can be risky—and you want to make sure your (and your clients’) information doesn’t end up in surprising places.

But just because everyone’s talking about AI doesn’t mean it’s the magic bullet for every problem. Like any strategic investment, it takes careful consideration. So, before you hand over your data to a machine, let’s explore whether AI is really what your business needs—or if it’s just another shiny object in the tech landscape.

Where do I start?

Today, many organizations are somewhere along the AI/ML path. Most are experimenting with AI, some are actively building applications, and a handful have successfully deployed a solution. Like any other project, before you start trying to use AI in your organization, the first thing you should do is define the problem you are trying to solve. Only then can you determine if you really need AI as a part of the solution.

Ask yourself the following questions about the project. If you answer yes to all four items, the project is AI-worthy:

1. Do you want AI to replace tedious, repetitive tasks?

Start by identifying the business problem in specific, measurable terms. Determine the scope of the problem, its frequency, and the impact it has on your business. Is it recurring and time consuming? If the problem is complex, repetitive, or data-intensive, it might be suitable for AI.

2. Do you want to use AI because you can’t consistently apply a set of logical rules to answer the questions at hand?

If the problem involves large amounts of data that is difficult to process manually where the answer is derived by combining and weighing multiple factors, it may be a candidate for an AI-based solution. On the other hand, just because it can be automated doesn’t mean you need an AI solution—AI is expensive in terms of power and processing resources. If you’re running a simple routine task over and over, you might be just as well off using traditional programming methods. But, when you’re solving a complex task, you need a structure that is not a strict binary, and that’s when you might want to use AI.

3. Will you use AI for problems that humans can solve, but AI can solve much faster?

AI should help your organization solve problems it finds extremely difficult or nearly impossible to solve otherwise. AI excels at tackling complex problems that overwhelm traditional methods, such as processing vast amounts of data, recognizing intricate patterns, or making real-time predictions. If your business is facing challenges that manual processes or standard software can’t handle effectively, AI can step in to provide powerful, scalable solutions that would otherwise be out of reach.

But remember, AI should work with you, not against you. Understand how AI will integrate into your workflow and whether it aligns with your overall business strategy to avoid creating unnecessary complications or disrupting ongoing operations.

4. Do you intend for AI to increase productivity of a function or group?

Most AI projects are productivity based, even those that seem otherwise. Even AI projects aimed at improving customer experiences, like personalized recommendations, ultimately enhance productivity by streamlining interactions and reducing manual effort. At their core, most AI implementations are designed to automate tasks, optimize processes, or extract actionable insights, all of which drive greater efficiency and cost savings. And, that means you need to analyze the potential return on investment (ROI).

AI integration requires an investment in technology, data management, and often specialized personnel. Weigh the cost of implementing AI against the potential benefits it could bring. Will it save time or reduce costs? By how much? If the financial or productivity benefits outweigh the costs, AI may be a worthwhile investment.

Where to next?

Clearly defining the problem and deciding if it’s suitable for an AI-based solution is really just the first step. Once the problem is defined, you open up another set of questions around whether and how to implement it. Do you have the right data, resources, and expertise to support an AI solution? How will it integrate with your systems? How will you measure success? The answers to all of these questions should absolutely inform your decision-making, but understanding if you’re applying AI to the right problem is your starting point. Without that, you’re using a sledgehammer to crack a nut, so to speak.

The post Is AI Right for Your Business? 4 Questions to Ask appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

The Fine Print: How Minimum Data Retention Fees Affect Cloud Costs

2024-10-08 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/the-fine-print-how-minimum-data-retention-fees-affect-cloud-costs/

A decorative image showing a stylize image of an invoice with the phrase "minimum retention fees," as a line item.

You probably won’t notice a little asterisked footnote tucked at the bottom of the page the first time you read through a cloud storage vendor’s pricing tables. You probably won’t notice it the second or third time either. But you’ll definitely notice it when your bill comes in with charges for data you thought you deleted weeks ago.

That footnote explains an often overlooked challenge to your budget: minimum data retention periods. These policies, used by cloud providers like AWS, Azure, Google Cloud, and Wasabi, can lead to unexpected cost increases and complicated data management strategies.

Today, I’m breaking down cloud storage retention minimums and common scenarios where they directly impact storage budgets and data management policies.

What are minimum data retention periods?

Retention minimums specify the minimum amount of time that data must be stored before it can be deleted, overwritten, or moved to a different storage tier without incurring additional charges.

Cloud storage providers with multiple tiers like AWS or Google Cloud use minimum retention policies to ensure that customers cannot frequently move data between storage tiers to exploit lower-cost storage classes for short-term storage. For cloud providers that have a single class of storage, these policies allow providers to stabilize their resource usage and maintain predictable pricing structures.

Minimum retention periods can vary significantly between providers, and even between different storage tiers offered by the same provider. For example, AWS S3 Standard has no minimum retention period, but S3 Standard-IA has a 30 day minimum, Glacier has a 90 day minimum, and Deep Archive has a 180 day minimum.

Despite their significance, information about these retention periods is often buried in the fine print of service agreements or technical documentation.

What are delete fees?

Delete fees are a direct consequence of deleting or moving files before the retention minimum is met. Cloud providers charge these fees to ensure that the infrastructure allocated for the data is compensated for the resources it would have otherwise used during the retention period. This fee is typically prorated, representing the remaining days in the retention period that the data was meant to occupy in a storage class.

The terms “delete fees,” “minimum storage duration,” and “minimum retention fees” all refer to a similar policy.

How are delete fees incurred?

Early deletion fees can be triggered by various actions, not just the obvious deletion of files. Some examples include:

Moving data from a higher-cost tier to a lower-cost tier before the minimum retention period has been met: This scenario often catches organizations off guard when they attempt to optimize costs by transferring infrequently accessed data to a cheaper storage class.
Overwriting existing files: When a file is overwritten, the cloud provider typically treats this as a delete operation followed by a new write operation. If the original file hasn’t met its minimum retention period, the organization may be charged for the remaining time, even though they’re still using the same amount of storage space.

A decorative image showing three bars, one that represents the stored object, and two that represent what duration of days you might be charged for.

Implementation of automated lifecycle policies: Many organizations set up rules to automatically move or delete data based on its age or access patterns. However, if these policies don’t account for minimum retention periods, they can inadvertently trigger early delete fees on a large scale.
Renaming files or folders: Even seemingly benign actions like renaming files or folders can sometimes be interpreted as delete-and-rewrite operations by certain cloud storage systems, potentially triggering these fees.

Additionally, in multi-user or multi-team environments, lack of communication about retention policies can lead to unexpected charges. One team might delete or move data without realizing the financial implications for the entire organization.

The financial impact of minimum data retention periods

Minimum data retention periods, particularly in cold storage tiers, can have significant impacts on IT budgets. What may have seemed like a cost-saving storage tier can actually increase expenses when operations require frequent deletions or movements of data before the minimum retention period is over. But even in hot storage, these policies can unexpectedly inflate overall costs.

To illustrate the real-world impact of retention minimums, let’s examine a few common scenarios:

1. Backup strategy

Let’s say you have a 30 day backup strategy for your critical infrastructure, and you opt for Wasabi object storage to save costs vs. AWS. You plan to keep a month’s worth of backups in the cloud and will then replace them with the newer backups.

Wasabi’s minimum retention policy is 90 days for its Pay as You Go storage (and 30 days for its Reserved Capacity Storage).

You store an initial 50TB of backups in Wasabi on Day 1. On Day 31, the older backup is deleted and replaced with the newer backup. So, you incur costs for 30 days of Timed Active Storage (50TB) and 60 days of Timed Deleted Storage (50TB). These charges are incurred every time the backup is replaced.

With Wasabi’s Pay as You Go storage, your monthly bill will look like this:

50TB x $6.99/TB/month x 3= $1048.50

We multiply by 3 because the 90 day minimum retention policy equals three months’ time. One of those you’ve actually stored for, and the other two are because you’ve replaced your backups with the new ones.

Compare this to Backblaze B2 Cloud Storage, which has no minimum retention policy and costs $6 per TB/month for its Pay as You Go storage:

50TB x $6/TB/month = $300

The minimum retention policy effectively triples the anticipated storage expenses. When scaled across multiple backup sets or extended periods, the impact on the IT budget can be substantial.

Delete fees in the real world: California university switches to Backblaze to eliminate surprise bills from Wasabi

Cal Poly Humboldt thought they understood cloud storage provider Wasabi’s pricing, but each month brought unexpected charges for deleted data due to Wasabi’s minimum storage retention policies. This, in turn, caused a chain reaction of calls from the procurement office, buying extra capacity, and then modifying the system to try to avoid further bills. To silence the monthly fire alarms, they switched to Backblaze.

With no retention minimums, Cal Poly Humboldt now knows exactly what their Backblaze costs will be up front. The move was so smooth that they migrated another 100TB from Google’s no-longer-free tier for educational institutions and plan to scale their storage to over a petabyte to back up and safeguard research data.

2. Application storage

In application storage use cases, retention minimums can impact cloud spend significantly when the data has a short lifecycle. Applications with high transaction volumes—such as e-commerce, user-generated content applications, or surveillance platforms—frequently upload and delete as part of their daily operations.

For example, most video surveillance platforms may only need 30 days of history for footage that’s been uploaded and processed, so something like a 90-day retention period doesn’t make financial or operational sense. E-commerce customers can also be affected; these businesses have users that frequently upload and delete content to manage storefronts, creating unpredictable data usage patterns. In these cases, you are at the mercy of your end users—if users churn through files quickly you will pay the retention penalties.

3. Video production

Retention minimums also affect video production workflows particularly when you need to make revisions once a project has been archived in cold storage—a common workflow many studios and broadcasting agencies use to get more affordable storage rates for seldom-accessed data.

Whether due to last minute changes in branding, edits to visuals, or adjustments to sound, the project needs to be pulled from storage for further modification. Because the files were moved to colder storage under a 90 day retention policy, accessing and modifying them before that period ends can trigger significant early delete fees.

If you routinely archive files immediately after a project completes anticipating that no further changes will be required, these early delete fees can add up quickly.

The hidden complexities of minimum data retention periods

Retention minimums can significantly impact your bottom line. These policies, often buried in the fine print, can lead to unexpected costs and complicate data management strategies across various industries.

Understanding the nuances of minimum data retention periods and their associated costs is crucial for developing an effective and economically sound cloud storage strategy. It enables organizations to make more informed decisions, avoid unexpected expenses, and better align their storage choices with their specific data management needs and budget constraints.

The post The Fine Print: How Minimum Data Retention Fees Affect Cloud Costs appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze B2 Event Notifications Now Generally Available

2024-10-03 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/using-b2-event-notifications/

A decorative image showing a cloud, gears, and an alarm notification.

No one likes being left out in the cold, least of all your data. With Backblaze B2 Event Notifications—now generally available—you can receive real-time notifications about object changes. That means that you can build more responsive and automated workflows across best-of-breed cloud platforms, saving time and money and improving your end users’ experiences. And, you can be alerted to changes in your data that may speed time to action.

Here’s how it works: With Backblaze B2 Event Notifications, any data changes within B2 Cloud Storage—like uploads, updates, or deletions—can automatically trigger actions in a workflow, including transcoding video files, spooling up data analytics, delivering finished assets to end users, and many others. Importantly, unlike many other solutions currently available, Backblaze’s service doesn’t lock you into one platform or require you to use legacy tools from AWS.

So, to businesses that want to create an automated workflow that combines different compute, content delivery networks (CDN), data analytics, and whatever other cloud service: Now you can, with the bonus of cloud storage at a fifth of the rates of other solutions and free egress.

Key capabilities

Flexible implementation: Event Notifications are sent as HTTP POST requests to the desired service or endpoint within your infrastructure or any other cloud service. This flexibility ensures seamless integration with your existing workflows. For instance, your endpoint could be Fastly Compute, AWS Lambda, Azure Functions, or Google Cloud Functions, etc.
Event categories: Specify the types of events you want to be notified about, such as when files are uploaded and deleted. This allows you to receive notifications tailored to your specific needs. For instance, you have the flexibility to specify different methods of object creation, such as copying, uploading, or multipart replication, to trigger event notifications. You can also manage Event Notification rules through UI or API.
Filter by prefix: Define prefixes to filter events, enabling you to narrow down notifications to specific sets of objects or directories within your storage on Backblaze B2. For instance, if your bucket contains audio, video, and text files organized into separate prefixes, you can specify the prefix for audio files in order to receive Event Notifications exclusively for audio files.
Custom headers: Include personalized HTTP headers in your Event Notifications to provide additional authentication or contextual information when communicating with your target endpoint. For example, you can use these headers to add necessary authentication tokens or API keys for your target endpoint, or include any extra metadata related to the payload to offer contextual information to your webhook endpoint, and more.
Signed notification messages: You can configure outgoing messages to be signed by the Event Notifications service, allowing you to validate signatures and verify that each message was generated by Backblaze B2 and not tampered with in transit.
Test rule functionality: Validate the functionality of your target endpoint by testing Event Notifications before deploying them into production. This allows you to ensure that your integration with your target endpoint is set up correctly and functioning as expected.
Retries: Event Notifications are automatically re-sent if the initial delivery attempt fails. This feature increases the reliability of Event Notifications by ensuring that temporary issues do not result in missed events, thus maintaining the integrity of your event-driven workflows.
Delivery: Event Notifications are designed for the at-least-once delivery guarantee to ensure Event Notifications are delivered reliably, even in the presence of network or system failures.

Versatile use cases

This past April, we announced Event Notifications in preview, and folks have put Event Notifications to work in some incredible ways. Today, we’re sharing some of the key use cases that came out of the preview to simplify your own workflows so you can focus on extracting insights from your data, rather than managing the logistics of data processing.

A diagram describing how Event Notifications work.

Automated media processing

Video transcoding: Many customers use Event Notifications to automate their video transcoding workflows. When a new video is uploaded to a Backblaze B2 Bucket, an Event Notification can trigger a transcoding process to generate all videos in the desired format.

Image processing: Similarly, customers also use Event Notifications to set up automated image processing pipelines, such as generating thumbnails or applying filters when new images are added to a Backblaze B2 Bucket.

Media processing is not limited to video transcoding or image processing. It can be extended to any other media processing workflow, minimizing the number of steps in the workflow.

Backup monitoring

Customers can receive notifications when backups are successfully uploaded to a Backblaze B2 Bucket with Event Notifications, providing peace of mind and ensuring data protection. Whether you want to track your nightly or monthly backups, you can get a notification when they are completed.

Presigned URL monitoring

Using a presigned URL is a standard way to share a file without giving the full access to your Backblaze B2 Bucket. Customers are using Event Notifications to know when their clients upload files via presigned URLs to Backblaze B2. They can get a callback to confirm that the upload is complete.

Security and access control

Unauthorized access detection: Customers are using Event Notifications to track access to highly confidential video files and report back to their clients as needed. Event Notifications help them detect any unauthorized access and take immediate action.

Audit trails: Some customers are using Event Notifications to create a detailed audit log of supported bucket activities through Event Notifications, which is useful for their compliance and security purposes.

Anomaly/malware detection: Event Notifications can strengthen security by detecting unusual access patterns, like malware that deletes or overwrites backups, by getting notifications of changes to Backblaze B2 Buckets.

Integration with external systems

Database synchronization: Customers use Event Notifications to keep databases in sync with the state of their Backblaze B2 Buckets. It’s critical to ensure data consistency across systems as their applications run on the databases.

Document management system: Some customers use Event Notifications with a workflow system to track document revisions, uploads, and deletes, or to notify team members when specific documents are uploaded or deleted.

Analytics and reporting

Performance analytics: Some customers use Event Notifications to monitor their backup performance and completion times, helping to optimize their data management strategies.

Usage tracking: Event Notifications can help track storage consumption by individual users or projects, facilitating better resource management and cost allocation.

These are just a few of the use cases our preview customers shared with us, and the sky is truly the limit for ways Event Notifications can empower you to simplify and streamline your workflows.

Ready to get started?

For existing customers working with a Backblaze account manager, Event Notifications is enabled for you today. If you need assistance, your account manager is happy to help.

For existing customers who are not currently working with an account manager, please contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn more about how Event Notifications can benefit your business and how to get started.

A screenshot of the Backblaze account screen where you can enable Event Notifications.

Once Event Notifications is enabled on your account, log in to your Backblaze B2 account, navigate to the Buckets page, and click on the Event Notifications section. From there, you can set up notification rules for the events you want to track or configure notifications through our API.

For detailed instructions and best practices, check out the Event Notifications documentation.

What’s next?

Please do share how you’re leveraging Event Notifications to build more efficient, automated, and responsive workflows so that other organizations and developers can benefit from what you find. If you have any questions or feedback, please don’t hesitate to reach out to us.

The post Backblaze B2 Event Notifications Now Generally Available appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

The Cloud Storage Playbook: 4 Best Practices for Sports Teams

2024-10-02 Laquie Campbell

Post Syndicated from Laquie Campbell original https://www.backblaze.com/blog/the-cloud-storage-playbook-4-best-practices-for-sports-teams/

A decorative image showing a cloud icon surrounded by media icons.

Video and data are the lifeblood of sports teams and leagues, fueling everything from fan engagement to game analysis.

To keep operations running smoothly, sports teams need to ensure that assets are stored securely, managed cost effectively, and kept ready for quick access. Cloud storage is increasingly part of sports teams’ data management playbooks, integrating with existing workflows and media tools so that teams can stay sharp and keep fans engaged.

Let’s break down what’s driving data growth in the sports market, use cases for cloud storage, and four best practices you can use to adopt cloud storage in a hybrid approach.

What’s driving data growth for sports teams?

During a given game, teams typically capture multiple camera angles, including sideline and aerial views, along with player-specific footage. Inside the stadium, teams use video and data to create an immersive fan experience, with big-screen displays and other screens showing player profiles, replays, real-time stats, and more. The action doesn’t stop there. Live feeds and exclusive content delivered on mobile devices add interactivity, bringing the game closer to the audience.

Sports teams generate a massive amount of video and image data during a game. As an example, a given professional sports game may involve around 10–12 cameras, and each can generate several terabytes of high-definition (high-def) footage over the course of the game.

High-def video files can range from 1–3GB per minute of footage, meaning a two to three hour game with multiple cameras might produce dozens of terabytes. On top of that, teams use high speed cameras for slow motion analysis, which further increases the data volume. When considering still images from different angles and high-resolution (hi-res) formats, the overall image and video data generated per game can easily reach 10–20TB or more, depending on the resolution and frame rates used.

How sports organizations take advantage of cloud storage: Key use cases

Given the massive data growth in sports organizations, many teams rely on cloud storage to help them store, manage, and use that data effectively. Here’s how they do it.

Replacing aging on-premises systems

Professional sports teams have long relied on on-premises storage like LTO tape systems or servers to keep their game footage, player performance data, and other critical content safe. But as time goes on, these systems become harder to maintain, prone to breakdowns, and outmatched by the growing volume of data. As media and data continue to pile up, teams need storage that can scale fast without requiring a major investment in new infrastructure.

By using cloud storage—typically through a hybrid infrastructure that utilizes both cloud and on-premises systems—sports organizations can off-load some of the hassle of maintaining and upgrading aging physical systems. Cloud storage eliminates the need for constant hardware replacements, freeing up IT teams to focus on more strategic plays.

Eagles retire LTO, drafting up an active cloud archive

With multiple championships behind them, the Philadelphia Eagles had decades of incredible content to mine and protect, but they needed to draft and train up some new technical assets to stay in contention. They retired their LTO-6 system and shifted hundreds of terabytes off of their storage area network (SAN) to a true cloud archive in Backblaze B2 Cloud Storage. Check out their game plan for protecting data and improving media workflows in the cloud.

Enhancing video management and distribution

By implementing cloud storage for hot archives, a league or team can store all video content in a centralized repository that offers instant access from anywhere, especially when paired with cloud-friendly media asset management (MAM) tools.

Cloud storage simplifies the process of sharing large video files with players, broadcasters, and media outlets, boosting an organization’s ability to monetize its content.

Backblaze B2 Live Read changes the game

Advanced services like Live Read give teams the ability to access, edit, and transform media as it’s uploaded. This speeds up content retrieval for analysis, editing, and distribution, making it especially useful on game days, when quick access to video and analytics can influence real-time decisions and help create up-to-the-minute content.

Business continuity and disaster recovery

Keeping sensitive data and high value media safe is nonnegotiable for sports organizations. A natural disaster, cyberattack, data breach, or other threat to stored data and media can cause days or weeks of downtime, making critical assets inaccessible and leading to significant operational disruptions.

Teams are using cloud storage to create geographic redundancy that ensures that data stays secure and recoverable even in the event of a local disaster. Tools like Object Lock add an extra layer of protection, making sure that data can’t be tampered with or deleted.

Integrating AI capabilities

AI is employed by sports teams to automate video analysis and content tagging, create highlight reels almost instantly, and scale personalization efforts.

Using the cloud to implement AI for sports media makes sense thanks to its scalability, processing power, and accessibility. Cloud platforms can handle vast amounts of video data and provide the computational resources necessary for AI-driven tasks like real-time analysis, high-speed editing, and video rendering. The cloud also enables collaboration across multiple locations, allowing teams, coaches, and analysts to access and process data seamlessly. Cloud-based AI is cost efficient, as teams only pay for the resources they use, avoiding the high costs of maintaining dedicated on-premises AI infrastructure.

Leverage video understanding with Twelve Labs

Twelve Labs’ video understanding platform allows you to build AI functionality into your workflows, giving you the ability to automate metadata tagging and search video archives with natural language. Check out how it integrates with cloud storage in Backblaze B2.

Optimizing costs

As traditional storage systems scale up, they can become prohibitively expensive—not only in direct costs, but also in ongoing maintenance and management. Cloud storage is inherently scalable, capable of handling growing volumes of content and data without breaking the bank.

Cloud storage helps sports teams optimize data storage costs by offering scalable, flexible pricing models that align with their data needs. Depending on their needs, teams can choose to pay for the exact amount of storage they use or to leverage capacity-based storage plans, but in either case, they’ll avoid the need for expensive on-premises hardware that often requires over-provisioning.

With cloud storage, teams can dynamically scale storage up or down based on their requirements—like during the season when video data surges—though it’s essential to consider things like egress fees in cost calculations. Backblaze, for example, includes 3x free egress, which can reduce costs significantly.

Hybrid cloud storage for sports teams

Many organizations take a phased approach in embracing cloud storage, or choose to continue leveraging on-premises storage infrastructure along with new cloud storage resources in a hybrid model. As with any deployment of new technology, this process is best undertaken with a thoughtful game plan.

Four best practices for adopting hybrid cloud storage

1. Assess your current infrastructure

Begin by auditing the on-premises storage systems you currently rely on to maintain team footage and data. Knowing where your storage infrastructure falls short, you can set clear objectives for a hybrid solution, such as increased accessibility or more cost-effective scaling options, and then map out your shift to the cloud. Evaluate capacity, performance, and scalability limits to help identify pain points (e.g., slow access to media files, high costs) and inform prioritization of the data or content that should move to the cloud.

2. Prioritize media for migration

Depending on your goals, you’ll prioritize different media for a cloud migration. For example, if your goal is to modernize your archive and make it more accessible for monetization, it makes sense to move archives off LTO to an active cloud archive. On the other hand, if your goal is streamlining remote workflows, your production data is likely first up for a cloud migration while you can maintain on-premises solutions for your archives as long as they’re serving your needs.

3. Leverage hybrid storage as a transition stage

With a cloud storage platform that integrates smoothly with existing on-prem storage and applications, you are well positioned to implement a hybrid cloud solution. A hybrid storage model allows you to shift operations to the cloud gradually, without the need for an abrupt overhaul. As you navigate this transition, your team can begin to take advantage of cloud scalability and flexibility without abandoning familiar workflows or compromising performance.

4. Establish clear data management policies

Structured data management helps prevent inefficiencies, such as duplicated or misplaced files, and ensures that storage solutions align with operational needs. Create clear policies for where media and data are stored (on-prem or cloud), when and how each should be moved, and which users have access (and at what level).

Preparing for the future

As sports organizations continue to generate and rely on massive amounts of video and data, cloud storage is increasingly becoming a strategic necessity. By embracing cloud storage, teams and leagues can increase their efficiency, improve fan engagement, enhance performance analysis, and ensure operational continuity—all while optimizing costs and future-proofing their infrastructure. The result? More streamlined, secure, and scalable storage that supports long-term success on and off the field.

The post The Cloud Storage Playbook: 4 Best Practices for Sports Teams appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Is Your Data Really Safe? How to Test Your Backups

2024-09-26 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/is-your-data-really-safe-how-to-test-your-backups/

A decorative image showing icons related to backing up and restoring data.

Ransomware is now a billion dollar industry, and one of the best things any business can do to protect its bottom line is to back up. But, it’s important to remember that backups are only the first step in the process—when you are affected by a ransomware attack, natural disaster, or even human error, you’ll then need to restore.

As your business scales and becomes more complex, so does your backup and restore process. You’ll have more types of data to restore, on more networks and devices, with more people involved at every step of the way.

The best way to make sure your backups are effective? Test them regularly. Let’s talk about why and how.

Good reasons to test your backups

By regularly testing your backups, you can improve your chances of a successful recovery and minimize the impact of data loss. Here are several reasons why regular backup testing is crucial:

Data integrity verification: Testing ensures that your backups are accurate and complete. A failed test might reveal corrupted files or missing data that could lead to significant losses.
Recovery process validation: By simulating the recovery process, you can identify potential bottlenecks or issues in your restoration procedures. This ensures that you can quickly and effectively recover your data in case of a disaster.
Disaster readiness assessment: Regular testing helps you assess your overall disaster recovery plan. It reveals any weaknesses or gaps that need to be addressed to ensure business continuity and to meet recovery time objectives.
Compliance adherence: Many industries have strict data retention and backup requirements. Testing helps you demonstrate compliance with these regulations.
Cyber insurance standards: Cyber insurance adoption is increasingly important for businesses, and many cyber insurance providers focus both on helping their clients prepare for ransomware attacks and recovery after the fact. As a result, many require regular backup verification testing and reporting.
Peace of mind: Knowing that your backups are reliable and tested can provide peace of mind and reduce stress during a crisis.
Early detection of issues: Testing can uncover problems with your backup software, hardware, or processes early on, allowing you to address them before they lead to more significant consequences.

In short, regular backup testing not only confirms that your data is properly backed up, but also ensures that you’re meeting recovery point objectives (RPO), have key features like immutability configured properly, and supports overall business objectives.

Ransomware and backups

In addition to the above reasons, it’s important to note the growing trend for ransomware bad actors to specifically target backups. Veeam’s 2024 Ransomware Trends Report shows that 96% of attacks focus on backup repositories with the bad actors successfully affecting the backups in 76% of cases. Elsewhere, Sophos reports in instances where backups were compromised, ransomware demands doubled, and recovery costs were eight times higher.

How to test your backups

Testing device backups is crucial to ensure data integrity and recoverability in case of loss or damage. Here are some effective methods:

1. Manual restoration tests

Regularly restore files: Select random files from your backup and restore them to a different location. Verify that the restored files are identical to the original files.
Test system restore: If your backup includes system images, periodically restore them to a separate partition or virtual machine to ensure they function correctly.

2. Automated testing tools

Backup software features: Many backup solutions offer built-in testing features. These tools can automatically verify the integrity of your backups and alert you to any issues. Restore services like Cloud Instant Backup Recovery can also provide valuable insight and support before, during, and after ransomware events.
Third-party verification tools: Consider using specialized tools designed for backup verification. These tools can provide more in-depth analysis and reporting.

3. Simulated disaster scenarios

Create a test environment: Set up a simulated disaster environment, such as a corrupted hard drive or a system failure.
Attempt recovery: Try to restore your data from the backup to the simulated environment. This will help you assess the effectiveness of your backup and recovery procedures.

4. Cloud-based backup testing for different recovery scenarios

Restore workstations: If you use cloud backup for your workstations, test restoring your files to a new device. This will show the functionality of the cloud backup service and ensure that your data can be accessed and restored successfully.
Restore server or network data: In addition to endpoints, you’ll also want to restore your servers or networks to different business locations. This lets you pressure test the cost of restores to account for things like hidden fees, and to ensure functions like immutability are properly configured.

5. Regular backup verification

Check file integrity: Regularly verify the integrity of your backup files using checksums or hash functions. This will help detect any corruption or damage that may have occurred.
Review backup logs: Monitor your backup logs for any errors or warnings that might indicate issues with the backup process.

By following these methods, you can ensure that your device backups are reliable and that you can recover your data effectively in case of a disaster.

The human element

Don’t forget that this includes things like establishing where and how you’ll communicate if, for instance, company email is offline. It’s also important to designate incident managers to streamline decision making and ensure that essential personnel have the access and permissions they need.

How cloud storage can help

Store your backup data in readily accessible, hot storage. This minimizes retrieval times during a disaster, enabling faster recovery of critical applications and data.

By implementing a robust backup strategy that incorporates the 3-2-1 backup rule (or, the more robust, and increasingly enterprise standard 3-2-1-1-0 method), immutability, version control, and cloud storage, you can ensure the protection of your critical data against various threats. And, by testing frequently, you can rely on the fact that those backups—and your team—are ready to get your business back online as soon as possible.

The post Is Your Data Really Safe? How to Test Your Backups appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Noise

Tag Archives: Featured-Cloud Storage

The collective thoughts of the interwebz