All posts by Bala Krishna Gangisetty

A Deeper Look at S3 Compatible Lifecycle Rules in Backblaze B2

2026-02-05 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/a-deeper-look-at-s3-compatible-lifecycle-rules-in-backblaze-b2/

A decorative image showing a pattern of diamonds on a gradient background.

As data footprints swell and multi-cloud strategies become the norm, the complexity of managing object lifecycles—from initial creation to eventual archival or deletion—can introduce significant risk and operational overhead. To help our customers solve for this, we recently announced support for lifecycle rules through S3-compatible APIs.

While the previous post focused on what the feature enables and why it matters, this follow-up will look at how lifecycle rules work at a deeper level and what to keep in mind when using them in production.

Why S3 compatible lifecycle rules matter

Let’s quickly refresh your memory about why this feature matters. Many customers already rely on lifecycle rules in Backblaze B2 to manage storage costs and data retention. These rules automate actions like deleting old objects, hiding previous versions, or cleaning up incomplete multipart uploads.

By adding S3 compatible support for lifecycle rules in Backblaze B2, it allows you:

Lift and shift migrations from AWS S3 to Backblaze B2
Reuse of existing tools, xml configurations, scripts, applications & infrastructure as code
Predictable behavior across environments as part of multi-cloud strategy

API surface area and XML structure

Lifecycle rules are managed using three APIs:

PutBucketLifecycleConfiguration
GetBucketLifecycleConfiguration
DeleteBucketLifecycleConfiguration

These APIs accept and return XML documents that follow the AWS S3 lifecycle configuration format. From a client perspective, this behaves the same way it does on S3, including rule IDs, status flags, filters, and actions.

Example put lifecycle request

Below is a simple example that hides objects under the logs/ prefix after 30 days.

PUT /?lifecycle HTTP/1.1
Host: my-bucket.s3.us-west-004.backblazeb2.com
Content-Type: application/xml

<LifecycleConfiguration>
  <Rule>
    <ID>deleteOldLogs</ID>
    <Status>Enabled</Status>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Expiration>
      <Days>30</Days>
    </Expiration>
  </Rule>

 <Rule>
   <ID>deleteOldLogs_marker</ID>
   <Status>Enabled</Status>
   <Filter>
     <Prefix>logs/</Prefix>
  </Filter>
  <Expiration>
    <ExpiredObjectDeleteMarker>true</ExpiredObjectDeleteMarker>
  </Expiration>
 </Rule>
</LifecycleConfiguration>

If the XML validates successfully, B2 stores this configuration and returns a 200 OK response.

Example get lifecycle response

GET /?lifecycle HTTP/1.1
Host: my-bucket.s3.us-west-004.backblazeb2.com

<LifecycleConfiguration>
  <Rule>
    <ID>deleteOldLogs</ID>
    <Status>Enabled</Status>
    <Filter>
      <Prefix>logs/</Prefix>
    </Filter>
    <Expiration>
      <Days>30</Days>
    </Expiration>
  </Rule>
</LifecycleConfiguration>

The response mirrors the original configuration, allowing tools like Terraform or AWS CLI to reconcile state.

Mapping from S3 rules to B2 rules

Backblaze B2 already supported lifecycle rules through the B2 Native API. The S3 compatible lifecycle implementation builds on this foundation.

Here is how the mapping works at a high level:

Each S3 lifecycle rule is converted into one or more internal B2 lifecycle rules.
Prefix filters in S3 map directly to B2 file name prefixes.
Expiration actions translate into B2 hide or delete operations.

This approach allows us to reuse the same lifecycle execution engine while exposing a familiar S3 interface.

Conceptually, the flow looks like this:

S3 XML lifecycle rule
        |
        v
XML validation
        |
        v
S3 rule compilation
        |
        v
B2 lifecycle rule objects
        |
        v
Lifecycle execution engine

Expiration behavior

Lifecycle rules can target both current and non current versions.

Some important details:

Expiring the current version results in a hide marker being created, matching S3 behavior.
Non current version expiration deletes older file versions after the configured number of days.
B2 keeps the same guarantees around version ordering and visibility that exist in the native API.
To remove abandoned hidemarkers by default, an ExpiredObjectDeleteMarker rule for expiration by days is a must.
The ExpiredObjectDeleteMarker rule ID is derived from the expiration by days rule ID with a “_marker” suffix.

Understanding this behavior is important if you rely on object versioning for recovery or audit purposes.

Here is an example to remove older versions of objects with prefix data/ that are hidden for more than 90 days.

<Rule>
  <ID>delete-old-versions</ID>
  <Status>Enabled</Status>
  <Filter>
    <Prefix>data/</Prefix>
  </Filter>
  <NoncurrentVersionExpiration>
    <NoncurrentDays>90</NoncurrentDays>
  </NoncurrentVersionExpiration>
</Rule>

How this behaves internally

At evaluation time, B2 walks object versions in order and applies rules like this:

Pseudo code:

for each object in bucket:
  for each version of object with data prefix:
    if version is non current:
      if age_in_days >= NoncurrentDays:
        delete version

For current versions, expiration works differently.

Expiring the current version

When a lifecycle rule expires the current version, B2 creates a hide marker instead of deleting data immediately.

This matches S3 semantics and preserves version history.

current version expired
        |
        v
hide marker created
        |
        v
current version becomes non current

Overlapping and nested prefix rules

Backblaze B2 supports multiple lifecycle rules on the same bucket, including overlapping and nested prefixes.

For example, you might configure:

A general rule for logs/ that deletes objects after 365 days
A more specific rule for logs/audit/ that deletes objects after 90 days

When rules overlap, the configuration with lowest value is applied to save cost to you. This mirrors S3 behavior and allows fine grained control without duplicating buckets.

Matching logic:

object path: logs/audit/2024/01/file.json

matching rules:
- logs/
- logs/audit/

selected rule:
- logs/audit/ because it has lowest value, 90 days

Multipart upload cleanup

Lifecycle rules can also be used to abort incomplete multipart uploads if not completed in a certain number of days. This is implemented by tracking the initiation time of multipart uploads and periodically removing uploads that exceed the configured threshold. This helps prevent abandoned uploads from consuming storage indefinitely.

Below is an example rule to delete multipart uploads that are not completed after seven days after starting the upload process.

<Rule>
  <ID>abort-multipart</ID>
  <Status>Enabled</Status>
  <AbortIncompleteMultipartUpload>
    <DaysAfterInitiation>7</DaysAfterInitiation>
  </AbortIncompleteMultipartUpload>
</Rule>

Internal execution model

Multipart uploads are tracked with their initiation timestamp.

Pseudo code:

for each multipart_upload:
  if now - initiation_time >= DaysAfterInitiation:
    abort upload
    delete uploaded parts

Lifecycle execution model

Lifecycle rules are not executed in real time. Instead, they are evaluated by background processes that scan eligible objects and apply actions asynchronously.

Important characteristics:

Rules are not evaluated in real time
Execution is eventually consistent
Large buckets may take multiple passes to fully apply changes

This design allows lifecycle processing to scale across billions of objects efficiently without impacting foreground operations.

High level execution flow:

scan bucket periodically
   |
   v
identify eligible objects
   |
   v
apply lifecycle actions (hide and/or delete)
   |
   v
record progress and continue

Error handling and validation

Lifecycle XML is validated using S3 compatible rules:

Invalid combinations of actions are rejected
Missing required fields return errors
Unsupported actions return descriptive failures

While error messages may differ slightly from AWS, the validation logic follows the same constraints.

Compatibility considerations

While the goal is strong S3 compatibility, there are still some things to keep in mind:

Not all S3 lifecycle actions are supported yet
XML validation follows S3 rules, but error messages may differ slightly
The underlying storage model of B2 can affect timing and visibility

We recommend testing lifecycle rules in a non production bucket before rolling them out broadly.

Final thoughts

Adding S3 compatible support for lifecycle rules on Backblaze B2 was all about making migrations simpler and letting customers reuse existing automation with confidence. It behaves the way you expect while benefiting from B2’s internal scalability.

If you already manage lifecycle rules using S3 APIs, those same configurations can now run on Backblaze B2 with minimal or no changes.

The post A Deeper Look at S3 Compatible Lifecycle Rules in Backblaze B2 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Lifecycle Rules: Now Supported Through S3-Compatible APIs

2025-12-12 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/lifecycle-rules-now-supported-through-s3-compatible-apis/

A decorative image showing a cloud, gears, and an alarm notification.

Managing object lifecycles is one of the simplest ways to control storage costs, keep buckets organized, and automate data hygiene. Backblaze B2 has supported lifecycle rules for years through our B2 Native APIs and the web application. Today, we’re expanding that support by adding S3 compatible lifecycle rule APIs, making it easier for you to use S3 tools and workflows to manage your data on Backblaze B2 (check out the docs: S3 Put Lifecycle Configuration, S3 Get Lifecycle Configuration, S3 Delete Lifecycle Configuration). This provides you with more flexibility and control over object management—all with Backblaze’s signature simplicity and affordability.

New survey: The Hidden Cost of Cloud Storage

We surveyed over 400 IT decision makers and one thing stood out. Surprise charges affect almost everyone. Learn what’s driving them—and how to avoid them.

What’s new

You can now create, edit, and manage lifecycle rules on B2 Cloud Storage using standard S3 compatible APIs, including:

Lifecycle rules can be applied to:

Entire buckets
Specific prefixes (e.g., logs/, images/monkeys/)
Nested folders with overlapping logic (e.g., animals/ and animals/cows/)

Why this matters

1. Seamless migrations from AWS

If you’re moving workloads from AWS S3, you can bring your existing lifecycle configurations with minimal changes. This reduces work while migrating your workloads to Backblaze B2.

2. More flexibility for complex data structures

With support for nested and overlapping prefixes, you can apply precise expiration rules to different datasets with overlapping prefixes in the same bucket. Whether you’re managing rapidly changing logs or long-term archives, lifecycle rules allow more controlled, automated retention.

3. Increased functionality for multi-cloud architectures

For teams looking to capitalize on multi-cloud architecture, this means more seamless integration up and down the stack.

4. Cost optimization

We apply the most cost-saving configuration for you in case of overlapping rules.

How it works

S3 compatible lifecycle rules define automated actions based on object age or status. With Backblaze B2, you can use rules to:

Delete objects after a set number of days
Hide objects after a set number of days
Expire noncurrent versions
Delete multipart uploads that are incomplete for a set number of days
Set multiple rules for nested prefixes for granular control

For example:

<LifecycleConfiguration>
    <Rule>
        <ID>DeleteTempFiles</ID>
        <Status>Enabled</Status>
        <Filter>
            <Prefix>temp/</Prefix>
        </Filter>
        <Expiration>
            <Days>30</Days>
        </Expiration>
    </Rule>
</LifecycleConfiguration>

This configuration automatically hides objects under temp/ after 30 days.

Example use cases

Automated cleanup for short-lived objects: Temporary files, build artifacts, or test data can expire automatically after a given retention period.
Compliance-driven policies: Organizations with strict retention and deletion requirements can enforce rules consistently across buckets.
Automated cleanup of incomplete multipart uploads: Save money by removing partial upload fragments that were never completed due to network interruptions, client failures, or abandoned sessions automatically.
Tiered retention for nested prefixes: Allow broad retention for the general dataset while enforcing a faster cleanup cycle for a particular subset that changes more frequently or has lower long-term value. For example, you can apply a 30-day expiration rule to all objects under the prefix temp/, while assigning a shorter 7-day expiration rule to the more specific prefix temp/webserver/.

How it complements existing B2 Lifecycle Rules

Backblaze continues to support lifecycle rules through the B2 Native API, which many customers use today. The new S3 compatible support offers an additional path for lifecycle automation, especially for customers with S3 style infrastructure or tooling.

Although both S3 compatible and B2 Native lifecycle rules can coexist, we strongly recommend using a single method depending on your workflow preferences to manage lifecycle rules.

Getting started

You can begin using lifecycle rules today through:

S3 compatible SDKs
S3 CLI tools
Direct API calls

Refer to Backblaze S3 Compatible API documentation for more details, specifically:

Most customers can apply their existing AWS lifecycle rules directly, without modification.

Availability

S3 compatible lifecycle rules are now available.

Wrapping up

Lifecycle rules help automate routine object cleanup and give teams predictable control over object retention. With the addition of S3 compatible lifecycle APIs, Backblaze B2 makes it even easier for customers to bring their existing S3 workflows to our platform while continuing to use the lifecycle management strategies they already trust.

Get started with Backblaze today by signing up for an account or contacting Sales.

The post Lifecycle Rules: Now Supported Through S3-Compatible APIs appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Preview Bucket Access Logs for Greater Visibility and Control

2025-04-24 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/preview-bucket-access-logs-for-greater-visibility-and-control/

You’ve always had insight into the buckets in your B2 Cloud Storage account, and now you can go deeper. With Bucket Access Logs, it’s possible to see a detailed record of operations performed against objects inside of a bucket. Whether you’re managing a growing archive, running audits, or troubleshooting automated workflows, these logs can provide the transparency needed to stay in control.

Starting today, Bucket Access Logs are available in limited preview. If you are interested, reach out to Support for more information.

What you can track with Bucket Access Logs

Once enabled, Bucket Access Logs record a range of operations performed on the objects in a bucket. That includes:

Uploads
Downloads
Deletions
And more

Every log entry includes details like the timestamp, operation type, and the object involved. If you’ve ever wished you had a record of what happened—and when—it’s now within reach.

Easy to configure: User interface (UI) and APIs

Bucket Access Logs are fully S3 compatible. You can configure logging through the Backblaze B2 web UI or programmatically via the S3 API using standard tools or SDKs. This makes it easy to integrate logging into your existing workflows and infrastructure without needing to learn anything new.

Because B2’s Bucket Access Logs are S3 compatible, your existing S3 log management tools will work seamlessly with B2 Logs. This allows you to use the same tools and processes you already have in place for monitoring, analyzing, and storing logs.

Important: Don’t enable access logging on the same bucket that you use as the log destination. This can result in an endless loop of logs generating more logs.

Once configured, you’ll begin to see new log objects arrive in the destination bucket as activity occurs in the source bucket. From there, you can analyze, archive, or pipe the data into other systems for further processing.

Preview: Configuring Bucket Access Logs in the UI

Here’s a preview of how you will be able to configure Bucket Access Logs via the user interface:

A screenshot of how to navigate to the Backblaze Bucket Access Logs feature.

A screenshot of how to configure Backblaze Bucket Access Logs.

This simple, intuitive interface lets you easily configure your logging settings, choose a destination bucket, and start tracking operations on your objects. Once enabled, you’ll have access to the logs directly in the destination bucket, with the details you need to monitor and analyze your data access patterns.

Use cases for Bucket Access Logs

Bucket Access Logs unlock a broad set of security, privacy, and operational workflows. Here are just a few examples of how you can use them:

1. Security and privacy monitoring

Organizations storing sensitive data—like security footage, personal files, or customer assets—often need detailed audit trails for compliance and accountability.

Log object access activity through pre-signed URLs and correlate access with specific users.
Track access times, IP addresses, and user actions to meet reporting requirements or identify suspicious behavior.
Detect potentially compromised application keys by analyzing activity patterns without disrupting all keys in use.
Enforce privacy policies by monitoring the source IP addresses of requests and verifying they match allowed sources.
Analyze latency and bandwidth metrics across object access requests to optimize data delivery.

2. Infrastructure and traffic control

When storage access is tightly integrated with content delivery networks (CDNs) or other network layers, it’s important to confirm that traffic flows through the correct paths.

Validate that object uploads originate only from approved CDNs or endpoints, not directly from unauthorized sources.
Detect misconfigurations early by comparing traffic origins to expected network patterns.

3. Usage tracking and audit trails

Understanding how data moves through your system can help with cost management, client reporting, and internal transparency.

Monitor object uploads and deletions that impact billing to better forecast and control storage costs.
Maintain a historical record of object activity for clients or partners who require verifiable data access trails.
Troubleshoot issues in automated workflows by reviewing the sequence of operations on specific objects.

Across these use cases, a common thread emerges: the need to know when, what, and where for activity happening inside your buckets.

Get started with Bucket Access Logs

Bucket Access Logs are available today in preview. To try it out, contact Support for more information.

For more detailed instructions and guidance on configuring and using Bucket Access Logs, check out the official Bucket Access Logs documentation.

Whether you’re building for compliance, monitoring security, or just want better observability into your workflows, Bucket Access Logs give you the visibility you need—right where your data lives.

The post Preview Bucket Access Logs for Greater Visibility and Control appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage

2024-12-12 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/effortlessly-managing-unfinished-large-file-uploads-with-b2-cloud-storage/

An illustration of a cloud with boxes representing data uploading to the cloud.

Digital clutter isn’t just inefficient, it can be costly. And if cleaning up digital clutter in your business operations is one of your New Year’s resolutions for 2025, this post is for you. We’re talking about managing unfinished large file uploads.

One big culprit of digital clutter when it comes to cloud storage is unfinished large files. Managing unfinished large file uploads can be a complex task. If they are not managed well, they can consume space and incur costs without any benefit.

To address this, we’ve introduced a feature in Backblaze B2 Cloud Storage that automatically cancels unfinished large file uploads, saving you both time and money.

The challenge: Unfinished large file uploads

To upload a large file, you break it into smaller parts. You initiate the start notification. Each part is uploaded in parallel, and once all parts are received, a finish notification is sent. Only after the final step does the file become consumable. Sometimes, things don’t go as planned—network hiccups, API timeouts, or user interruptions can leave large file uploads unfinished. The process then likely restarts and completes successfully, but this leaves you with both a complete file and a partially completed file in your cloud storage instance. These unfinished uploads still take up storage space, leading to unnecessary costs.

Previously, users had to manually track down and delete these unfinished uploads. It’s error prone and time-consuming, and not an easy task especially with a large volume of files.

The solution: Canceling unfinished uploads through lifecycle rules

To streamline the process, we’ve added a feature that allows users to automatically cancel these incomplete uploads after a set number of days. By setting lifecycle rules through the B2 Native API, users can now specify how many days an unfinished large file can remain before it’s automatically deleted.

For detailed guidance on configuring this rule, check out our Lifecycle Rules Documentation.

Why it matters

This feature is useful in a variety of scenarios:

Network failures: If a network interruption prevents the final completion step, the unfinished upload will no longer remain indefinitely. Instead, it will be automatically cleared after the defined period, ensuring you aren’t paying for useless storage.
User interruptions: If an upload is manually paused or forgotten before completion, lifecycle rules will take care of these fragments, preventing forgotten uploads from lingering in storage.
Script failures: If your script fails or times out during the upload process, any incomplete files won’t go unnoticed. They’ll be cleared as per your rules, ensuring efficient storage management.

Cost-saving benefits

Unfinished uploads can quickly add up, both in storage usage and costs. By automatically canceling incomplete uploads, users can significantly reduce unnecessary expenses, keeping storage budgets under control. This is especially important for businesses with large-scale data transfers, where managing storage efficiency can have a direct impact on the bottom line.

What’s next?

Most users configure lifecycle rules through the console or Backblaze B2 command line tool (CLI), so we introduced this feature for the B2 Native API to address immediate customer needs while also laying the groundwork for integrating it into the B2 Cloud Storage web console. You can now use this feature via the CLI or B2 Native API. We’re working on adding UI support to make configuration even more accessible. Let us know in the comments if you’re looking for access to this feature via a different user interface.

In the meantime, here are a few steps you can take:

Implement lifecycle rules: Set rules that fit your upload behavior. Choose a reasonable timeframe to cancel unfinished large file uploads that balances with your cost-management goals.
Test the feature: Try configuring the lifecycle rule for a few test uploads to make sure it behaves as expected. Monitor how it handles interruptions or failures to ensure it aligns with your needs.
Monitor storage costs: Check your storage usage and billing before and after setting these rules to understand the impact on costs. Use the feedback to fine-tune your settings.
Stay tuned for UI updates: Keep an eye out for announcements regarding UI support for this feature. We’re committed to making it as intuitive and accessible as possible.

By leveraging lifecycle rules for unfinished large file uploads, you can maintain a cleaner, more efficient storage environment while saving money. For more details on configuring lifecycle rules, visit our API documentation.

The post Effortlessly Managing Unfinished Large File Uploads with B2 Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Seamless Data Migration with Custom Upload Timestamps

2024-12-05 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/seamless-data-migration-with-custom-upload-timestamps/

A decorative image showing two cubes, representing data, moving from cloud to cloud. There are clocks above each cube.

Migrating data to the cloud? Ensuring that original timestamps remain intact through a cloud migration can be a critical factor for successful data management at scale. Losing these timestamps can lead to operational challenges that hinder your ability to track data effectively, set proper lifecycle rules, create custom events, and more.

Backblaze B2 Cloud Storage now offers the Custom Upload Timestamps feature to help you manage your data. Today, I’m sharing details on the new feature, benefits, and how to enable it.

What are Custom Upload Timestamps?

The Custom Upload Timestamps feature is designed specifically to retain the original timestamps of your files during a migration. It is especially beneficial for users who rely on lifecycle rules to dictate file deletion or archiving based on age for compliance, to track file age manually, or maintain historical context of file.

Imagine this scenario: You have a critical file on another cloud storage provider, governed by a lifecycle rule that deletes it after 1,000 days. If you move the file to Backblaze B2 on day 999, the timestamp would be overwritten and you’d have to restart that lifecycle from day one. However, with this new feature, the original timestamp remains intact, and the file will still get deleted on day 1,000, just as planned. This capability not only simplifies the migration process, but also ensures continuity in your data retention policies, keeping your storage costs in line with expectations.

Benefits of Custom Upload Timestamps

Lifecycle rules play a crucial role in managing data retention, particularly when migrating large datasets. Losing the original timestamps means you’d have to manually reconfigure your rules or wait much longer for lifecycle events to take effect. The benefits of retaining original timestamps extend beyond just lifecycle rules.

Here is why this feature is essential:

Operational efficiency: Knowing the original timestamp of files allows for better organization and tracking. This is vital for businesses that rely on historical data to inform decisions or manage projects. When timestamps reset, it can lead to confusion and disarray in managing files. You may find yourself dealing with files that should have been deleted or archived but aren’t because of the reset timeline.
Compliance: For organizations that must adhere to regulatory standards for data retention, preserving timestamps can help meet legal requirements. It provides a clear audit trail and evidence of when files were created or modified.
Decreased workload: Manually tracking and reconfiguring lifecycle rules consumes valuable time and resources. By retaining the original timestamps, you eliminate unnecessary workloads.
File age tracking: Whether you’re managing backups, archival processes, or simple organizational tasks, knowing the age of a file can inform your decisions regarding when to review or delete files.
Historical context: For projects that span long periods, retaining timestamps helps maintain the context of data. This can be critical for collaborative efforts or projects that require consistent documentation.

Ultimately, the custom upload timestamps feature supports greater data portability, making it easier to move and manage large datasets. It ensures that migration to B2 Cloud Storage is as seamless as possible—without the need to reset or alter your data management policies.

Ready to get started?

The Custom Upload Timestamps feature is enabled by default for all B2 Cloud Storage customers. To utilize this feature, you need to include the X-Bz-Custom-Upload-Timestamp parameter when calling the b2_upload_file API. This simple addition allows you to retain the original timestamp of your file, thereby preserving its lifecycle state without interruptions and ensuring that your data remains organized and easy to track.

By retaining the original timestamps, Backblaze B2 helps increase the ease and granularity with which you can manage your data, especially for organizations migrating large volumes of data. You can transition your data while maintaining control over important metadata like the original timestamp, streamlining your operations, improving overall efficiency, and avoiding the stress of potential compliance issues.

What next?

To make the most of the Custom Upload Timestamp feature, consider the following actionable steps:

Review your migration workflow. Before starting the migration, ensure that your processes include the X-Bz-Custom-Upload-Timestamp parameter in your upload scripts or APIs. This will help prevent any disruption in tracking important metadata.
Test the feature. Conduct a pilot migration with a small number of files. This will allow you to confirm that the timestamps are retained correctly. Monitor the behavior of your data tracking after this test migration to ensure everything operates as expected.
Verify lifecycle rules. Once you complete the migration, take the time to check that your lifecycle policies continue to function as intended on B2 Cloud Storage. This verification step is crucial to avoid unexpected data retention issues.
Engage with Support. If you have any questions or encounter challenges, don’t hesitate to reach out to our Support team. We’re here to help you make the most of B2 Cloud Storage.

For more details, visit our API documentation to ensure you’re ready for a smooth migration. By leveraging the Custom Upload Timestamps feature, you can simplify your data management processes.

The post Seamless Data Migration with Custom Upload Timestamps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing Scalable Application Keys for Enhanced Security and Performance

2024-10-15 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/introducing-scalable-application-keys-for-enhanced-security-and-performance/

A decorative image showing keys erupting from a server.

If you work in an industry with high performance and security demands like video surveillance, internet of things (IoT), and mobile applications, a new Backblaze capability could help your cloud workflows—Scalable Application Keys. This new capability enables you to generate application keys for your Backblaze B2 Cloud Storage Buckets at 150 times the current scale. When you need high volume, short-lived application keys to upload your data to cloud storage, you can’t tolerate bottlenecks—the Scalable Application Keys feature removes them.

Today, I’m digging into the challenges solved by Scalable Application Keys, the use cases where it has the most impact, and the benefits of the feature.

Jump to the docs

Check out our documentation for more information on Scalable Application Keys and how to work with them.

The challenges

Managing large volumes of data under strict security requirements sometimes requires using application keys programmatically to interact with cloud storage buckets. In the video surveillance industry, for example, you might upload massive amounts of video footage directly from security cameras to Backblaze B2, and you need to regularly refresh the application keys used by each of those cameras to maintain a robust security posture. However, this practice has a few unique requirements:

You can’t be hampered by rate limitations to generate the volume of application keys you need.
You need high key limits sufficient for the total volume of keys.
Throughput needs to be high to generate hundreds or thousands of new application keys simultaneously.

The irregular timing of key refreshes and the scale of operations can further amplify the problem, especially when hundreds or thousands of devices request new keys at the same time.

The solution: Scalable Application Keys

With the introduction of Scalable Application Keys, Backblaze B2 customers can generate and refresh keys at significantly higher volumes and throughput—without hitting hard limits on the number of keys. This feature is designed to accommodate the unique requirements of customers who need:

High key throughput: Create keys at scale, even when thousands of devices need new keys simultaneously. You can create up to 10,000 keys per minute.
Unlimited key generation: Scale without interruption—there’s no hard cap on the number of application keys that can be generated.
Short-lived keys: Easily generate keys with very short lifespans, enhancing security without compromising functionality.
S3 compatibility: Maintain support for the Backblaze S3 compatible API, allowing you to avoid costly firmware upgrades on your devices.

Real-world applications

This feature is particularly beneficial for customers with many endpoints that all upload to Backblaze B2 cloud individually through the S3 Compatible API. For example:

Video surveillance: Companies with large networks of security cameras can now easily refresh keys for each device frequently. When you operate tens of thousands of cameras that record sensitive footage, you need to be able to refresh application keys regularly to maintain security. With Scalable Application Keys, you can handle refreshes efficiently and continue scaling the number of cameras in operation without worry.

Mobile applications: Developers of mobile apps that store data in B2 Cloud Storage can generate unique keys for each user’s device. This is especially useful for apps that rely on user-generated and user-uploaded content, where each end device needs its own application key.

IoT devices: Businesses managing large fleets of IoT devices, where each device needs a unique and regularly refreshed application key, can ensure secure, individualized access to cloud storage.

Benefits of Scalable Application Keys

Enhanced security:
- Frequent key rotation becomes feasible at scale, significantly reducing the risk of unauthorized access.
- Short-lived keys minimize the window of vulnerability if a key is compromised.
- Customers can implement best practices for key management without extraneous components that could cause performance penalties.
Operational flexibility:
- Easily manage key generation for large numbers of devices, from thousands to millions.
- Accommodate sudden spikes in key requests, such as during system-wide updates or resets.
- Adapt to varying usage patterns throughout the day without hitting rate limits.
Cost effectiveness:
- Avoid expensive firmware upgrades by continuing to use the S3 Compatible API.
- Eliminate the need for complex workarounds or additional infrastructure to handle key management.

Scalable Application Keys not only solves existing limitations but also future-proofs your workflows by providing the flexibility and performance needed to scale without restriction.This feature allows you to securely manage access to B2 Cloud Storage, no matter the scale of your operations.

Ready to get started?

This feature is available upon request. If you’re an existing Backblaze B2 customer and want to get access to this capability, please contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn more about how Scalable Application Keys can benefit your business and how to get started.

Once this feature is enabled, you can generate application keys at scale. Check out the documentation for more on how to use the feature.

What next?

Are you leveraging Scalable Application Keys to build more efficient and performant workflows? Share how it’s working for you so other organizations and developers can benefit from what you find. If you have any questions or feedback, please don’t hesitate to reach out to us.

The post Introducing Scalable Application Keys for Enhanced Security and Performance appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Backblaze B2 Event Notifications Now Generally Available

2024-10-03 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/using-b2-event-notifications/

No one likes being left out in the cold, least of all your data. With Backblaze B2 Event Notifications—now generally available—you can receive real-time notifications about object changes. That means that you can build more responsive and automated workflows across best-of-breed cloud platforms, saving time and money and improving your end users’ experiences. And, you can be alerted to changes in your data that may speed time to action.

Here’s how it works: With Backblaze B2 Event Notifications, any data changes within B2 Cloud Storage—like uploads, updates, or deletions—can automatically trigger actions in a workflow, including transcoding video files, spooling up data analytics, delivering finished assets to end users, and many others. Importantly, unlike many other solutions currently available, Backblaze’s service doesn’t lock you into one platform or require you to use legacy tools from AWS.

So, to businesses that want to create an automated workflow that combines different compute, content delivery networks (CDN), data analytics, and whatever other cloud service: Now you can, with the bonus of cloud storage at a fifth of the rates of other solutions and free egress.

Key capabilities

Flexible implementation: Event Notifications are sent as HTTP POST requests to the desired service or endpoint within your infrastructure or any other cloud service. This flexibility ensures seamless integration with your existing workflows. For instance, your endpoint could be Fastly Compute, AWS Lambda, Azure Functions, or Google Cloud Functions, etc.
Event categories: Specify the types of events you want to be notified about, such as when files are uploaded and deleted. This allows you to receive notifications tailored to your specific needs. For instance, you have the flexibility to specify different methods of object creation, such as copying, uploading, or multipart replication, to trigger event notifications. You can also manage Event Notification rules through UI or API.
Filter by prefix: Define prefixes to filter events, enabling you to narrow down notifications to specific sets of objects or directories within your storage on Backblaze B2. For instance, if your bucket contains audio, video, and text files organized into separate prefixes, you can specify the prefix for audio files in order to receive Event Notifications exclusively for audio files.
Custom headers: Include personalized HTTP headers in your Event Notifications to provide additional authentication or contextual information when communicating with your target endpoint. For example, you can use these headers to add necessary authentication tokens or API keys for your target endpoint, or include any extra metadata related to the payload to offer contextual information to your webhook endpoint, and more.
Signed notification messages: You can configure outgoing messages to be signed by the Event Notifications service, allowing you to validate signatures and verify that each message was generated by Backblaze B2 and not tampered with in transit.
Test rule functionality: Validate the functionality of your target endpoint by testing Event Notifications before deploying them into production. This allows you to ensure that your integration with your target endpoint is set up correctly and functioning as expected.
Retries: Event Notifications are automatically re-sent if the initial delivery attempt fails. This feature increases the reliability of Event Notifications by ensuring that temporary issues do not result in missed events, thus maintaining the integrity of your event-driven workflows.
Delivery: Event Notifications are designed for the at-least-once delivery guarantee to ensure Event Notifications are delivered reliably, even in the presence of network or system failures.

Versatile use cases

This past April, we announced Event Notifications in preview, and folks have put Event Notifications to work in some incredible ways. Today, we’re sharing some of the key use cases that came out of the preview to simplify your own workflows so you can focus on extracting insights from your data, rather than managing the logistics of data processing.

A diagram describing how Event Notifications work.

Automated media processing

Video transcoding: Many customers use Event Notifications to automate their video transcoding workflows. When a new video is uploaded to a Backblaze B2 Bucket, an Event Notification can trigger a transcoding process to generate all videos in the desired format.

Image processing: Similarly, customers also use Event Notifications to set up automated image processing pipelines, such as generating thumbnails or applying filters when new images are added to a Backblaze B2 Bucket.

Media processing is not limited to video transcoding or image processing. It can be extended to any other media processing workflow, minimizing the number of steps in the workflow.

Backup monitoring

Customers can receive notifications when backups are successfully uploaded to a Backblaze B2 Bucket with Event Notifications, providing peace of mind and ensuring data protection. Whether you want to track your nightly or monthly backups, you can get a notification when they are completed.

Presigned URL monitoring

Using a presigned URL is a standard way to share a file without giving the full access to your Backblaze B2 Bucket. Customers are using Event Notifications to know when their clients upload files via presigned URLs to Backblaze B2. They can get a callback to confirm that the upload is complete.

Security and access control

Unauthorized access detection: Customers are using Event Notifications to track access to highly confidential video files and report back to their clients as needed. Event Notifications help them detect any unauthorized access and take immediate action.

Audit trails: Some customers are using Event Notifications to create a detailed audit log of supported bucket activities through Event Notifications, which is useful for their compliance and security purposes.

Anomaly/malware detection: Event Notifications can strengthen security by detecting unusual access patterns, like malware that deletes or overwrites backups, by getting notifications of changes to Backblaze B2 Buckets.

Integration with external systems

Database synchronization: Customers use Event Notifications to keep databases in sync with the state of their Backblaze B2 Buckets. It’s critical to ensure data consistency across systems as their applications run on the databases.

Document management system: Some customers use Event Notifications with a workflow system to track document revisions, uploads, and deletes, or to notify team members when specific documents are uploaded or deleted.

Analytics and reporting

Performance analytics: Some customers use Event Notifications to monitor their backup performance and completion times, helping to optimize their data management strategies.

Usage tracking: Event Notifications can help track storage consumption by individual users or projects, facilitating better resource management and cost allocation.

These are just a few of the use cases our preview customers shared with us, and the sky is truly the limit for ways Event Notifications can empower you to simplify and streamline your workflows.

Ready to get started?

For existing customers working with a Backblaze account manager, Event Notifications is enabled for you today. If you need assistance, your account manager is happy to help.

For existing customers who are not currently working with an account manager, please contact our Support team to request access.

New to Backblaze? Contact our Sales team to learn more about how Event Notifications can benefit your business and how to get started.

A screenshot of the Backblaze account screen where you can enable Event Notifications.

Once Event Notifications is enabled on your account, log in to your Backblaze B2 account, navigate to the Buckets page, and click on the Event Notifications section. From there, you can set up notification rules for the events you want to track or configure notifications through our API.

For detailed instructions and best practices, check out the Event Notifications documentation.

What’s next?

Please do share how you’re leveraging Event Notifications to build more efficient, automated, and responsive workflows so that other organizations and developers can benefit from what you find. If you have any questions or feedback, please don’t hesitate to reach out to us.

The post Backblaze B2 Event Notifications Now Generally Available appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Command Like a Pro with New Backblaze B2 CLI Enhancements

2024-08-20 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/command-like-a-pro-with-new-backblaze-b2-cli-enhancements/

An image of a computer monitor with the words B2 Command Line Interface Tool Version 4.1.0

The tools you use impact your efficiency, productivity, and the quality of your work. That’s true whether you’re a carpenter looking for the best saw blades, a chef choosing high-quality knives, or a developer or programmer investing in top-notch software. The B2 Command Line Interface (CLI) is one tool that you can use to interact with B2 Cloud Storage, and some recent improvements make it a more powerful, intuitive part of your arsenal.

It’s been a while since our last blog about the Backblaze B2 Command Line Tool (B2 CLI for short). Today, we’re sharing more details on the key enhancements and new features as part of the B2 CLI version 4.1.0.

Let’s dive into the highlights of these changes and explore how they can elevate your B2 CLI experience.

User experience enhancements

1. A new nested command structure

Gone are the days of sifting through a long list of commands to find what you need. The B2 command structure has been revamped to be more intuitive and organized. With the new nested command structure, related commands are logically grouped together. The new structure looks like b2 <resource>. It makes it easier for you to locate and utilize the functionality you require. Whether you’re managing files, buckets, keys, or accounts, commands are now categorized in a way that aligns with their functions. This gives you a clearer and more concise enhanced user experience.

An image listing the usage tags for the Backblaze B2 CLI — New command structure.

2. Streamlined `ls` and `rm` commands

Why use two when one will do? The b2 ls and b2 rm commands can now accept a single cohesive string, B2 URI (e.g., b2://bucketName/path), instead of two separate positional arguments, giving you enhanced consistency and usability. It simplifies the command syntax and reduces potential for errors by eliminating the chance of misplacing or mistyping one of the separate arguments. And it ensures that the bucket and file path are always correctly associated with each other. This change minimizes confusion and helps to avoid common mistakes that can occur with multiple arguments.

In addition, some commands, such as b2 file large parts, accept a B2 ID URI (e.g. b2id://4_zf1f51fb…), which specifies a file by its unique identifier (a.k.a. Fguid).

Some redundant commands have also been deprecated with the introduction of the B2 and B2 URIs. For example: download-file-by-id and download-file-by-name functionality is available through b2 file download b2://bucketname/path and b2 file download b2id://fileid command.

3. Enhanced credential management

To enhance security and performance, the CLI will no longer persist credentials on disk if they are passed through B2_* environment variables (that is, B2_application_key_id and B2_application_key). This reduces the risk of unauthorized access to your credentials and improves the overall security of your environment.

At the same time, it’s important that security is balanced with performance. To address this, you can persist your credentials to local cache and can continue using local cache for better performance. You can explicitly choose to persist your credentials by using the b2 account authorize command.

By eliminating the automatic persistence of credentials from environment variables and providing a clear method to manage local caching, you now have a balanced approach that keeps your data secure while ensuring efficient CLI operations.

4. Transition to kebab-case flags

Previously CLI flags had mixed camelCase and kebab-case styles. Users needed to remember the style to use it along with the name for the option. But kebab-case, where words are separated by hyphens (e.g., --my-flag), offers a clearer and more straightforward way to read and interpret flags. We’ve transitioned all CLI flags to --kebab-case. This style not only enhances readability, making it easier to understand complex commands at a glance, but also makes it easy to remember. It’s particularly beneficial when flags are composed of multiple words, as it reduces visual clutter and makes the flag names more accessible.

5. Simplified listing with `ls`

Ever wondered how to list all your buckets in one go? Now, you can call b2 ls without any arguments to do this. Whether you’re managing multiple buckets or just need a quick overview of your entire bucket inventory, the ability to list all buckets with a single command saves you time and effort. The enhancement to the b2 ls command is all about making your life easier. (As an aside, it’s also the quickest way to check that Backblaze B2 is correctly configured and you’re using the right set of credentials.)

6. Handy aliases for common flags

Why go the long way when you can take shortcuts? You can now use -r as an alias for the --recursive argument and -q for the --quiet argument. These shortcuts make your command-line interactions quicker and more efficient. You can get things done with fewer presses.

7. Global quiet mode

The --quiet option is now available for all commands, allowing you to suppress all messages printed to stdout and stderr. This is particularly useful for scripting and automation, where you want to minimize output.

8. Autocomplete

This enhancement for the B2 CLI means that you no longer have to remember and type out lengthy command arguments or options manually. As you start typing a command, the CLI will provide you with suggestions for completing the command, options, and arguments based on the context of your input. This can significantly save up your time and help you avoid typos or incorrect entries.

New features to boost your productivity

In addition to the CLI enhancements, we’ve also recently announced a few new features and capabilities for Backblaze B2, including:

Event Notifications: Event Notifications helps you automate workflows and integrate Backblaze B2 with other tools and systems. You can now manage Event Notification rules through b2 bucket notification-rule commands directly from the CLI. The feature is available in public preview. If you’re interested, check out the announcement and sign up here.
Unhide files with ease: Previously, if you needed to reverse the hiding of a file, the process could be cumbersome or require multiple steps. To restore hidden files, using the b2 file unhide command is now as simple as it sounds. You only need to specify the file you want to unhide, and the command will handle the rest. This ensures that you can quickly and accurately restore file visibility without unnecessary complications. Whether you’ve hidden previous backup files and need to access them again, or when reorganizing your storage or adjusting file visibility for different users, or if you unintentionally hide files and need to make them visible for auditing or review purposes, you can use this command swiftly.
Custom file upload timestamps: You can now enable custom file upload timestamps on your account, enabling you to preserve original upload times for your files. This feature is ideal for maintaining accurate records for compliance and reporting, and it gives you greater control over the file metadata. If you’d like to enable the feature, please reach out to Backblaze Support.

In addition to the above highlights, we’ve implemented crucial fixes to improve the stability and reliability of the CLI. We’ve also made several improvements to our documentation, ensuring you have the guidance you need right at your fingertips.

Start using the new features today

The easier we can make your CLI experience, the easier your job becomes and the more you can get out of Backblaze B2. Install or upgrade the B2 CLI today to take advantage of all the new features.

As always, we value your feedback. If you have any thoughts or experiences to share as you start using the new enhancements and features, please let us know in the comments or submit feedback via our Product Portal. Your input is crucial in helping us continue to improve and innovate.

Happy coding, and enjoy the new B2 CLI offerings!

The post Command Like a Pro with New Backblaze B2 CLI Enhancements appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Automate Your Data Workflows with Backblaze B2 Event Notifications

2024-04-15 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/announcing-event-notifications/

Public Preview Update: July 31, 2024

Backblaze Event Notifications is now in public preview. If you’re interested in joining the waitlist, feel free to sign up here.

Backblaze believes companies should be able to store, use, and protect their data in whatever way is best for their business—and that doing so should be easy. That’s why we’re such fierce advocates for the open cloud and why today’s announcement is so exciting.

Event Notifications—available in public preview—gives businesses the freedom to build automated workloads across the different best-of-breed cloud platforms they use or want to use, saving time and money and improving end user experiences.

Here’s how: With Backblaze Event Notifications, any data changes within Backblaze B2 Cloud Storage—like uploads, updates, or deletions—can automatically trigger actions in a workflow, including transcoding video files, spooling up data analytics, delivering finished assets to end users, and many others. Importantly, unlike many other solutions currently available, Backblaze’s service doesn’t lock you into one platform or require you to use legacy tools from AWS.

If you’re already a Backblaze customer, you can join the waiting list for the Event Notifications preview by signing up here. Once you’re admitted to the preview, the Event Notifications option will become visible in your Backblaze B2 account.

A screenshot of the where to find Event Notifications in your Backblaze account.

Not a Backblaze customer yet? Sign up for a free Backblaze B2 account and join the waitlist. Read on for more details on how Event Notifications can benefit you.

With Event Notifications, we can eliminate the final AWS component, Simple Queue Service (SQS), from our infrastructure. This completes our transition to a more streamlined and cost-effective tech stack. It’s not just about simplifying operations—it’s about achieving full independence from legacy systems and future-proofing our infrastructure.

— Oleh Aleynik, Senior Software Engineer and Co-Founder at CloudSpot.

A Deeper Dive on Backblaze’s Event Notifications Service

Event Notifications is a service designed to streamline and automate data workflows for Backblaze B2 customers. Whether it’s compressing objects, transcoding videos, or transforming data files, Event Notifications empowers you to orchestrate complex, multistep processes seamlessly.

The top line benefit of Event Notifications is its ability to trigger processing workflows automatically whenever data changes on Backblaze B2. This means that as soon as new data is uploaded, changed, or deleted, the relevant processing steps can be initiated without manual intervention. This automation not only saves time and resources, but it also ensures that workflows are consistently executed with precision, free from human errors.

What sets Event Notifications apart is its flexibility. Unlike some other solutions that are tied to specific target services, Event Notifications allows customers the freedom to choose the target services that best suit their needs. Whether it’s integrating with third-party applications, cloud services, or internal systems, Event Notifications seamlessly integrates into existing workflows, offering unparalleled versatility.

Finally, Event Notifications doesn’t only bring greater ease and efficiency to workflows, it is also designed for very easy enablement. Whether via browser UI or SDKs or APIs or CLI, it is incredibly simple to set up a notification rule and integrate it with your preferred target service. Simply choose your event type, set the criteria, and input your endpoint URL, and a new workflow can be configured in minutes.

Public Preview Update: July 31, 2024

Additional capabilities offered in the public preview include:

Retries: Event Notifications are automatically re-sent if the initial delivery attempt fails. This feature increases the reliability of Event Notifications by ensuring that temporary issues do not result in missed events, thus maintaining the integrity of your event-driven workflows.
Delivery: Event Notifications are designed for the at-least-once delivery guarantee to ensure Event Notifications are delivered reliably, even in the presence of network or system failures.

What Is Backblaze B2 Event Notifications Good For?

By leveraging Event Notifications, Backblaze B2 customers can simplify their data processing pipelines, reduce manual effort, and increase operational efficiency. With the ability to automate repetitive tasks and handle millions of objects per day, businesses can focus on extracting insights from their data rather than managing the logistics of data processing.

A diagram showing the steps of event notifications.

Automating tasks: Event Notifications allows users to trigger automated actions in response to changes in stored objects like upload, delete, and hide actions, streamlining complex data processing tasks.

Orchestrating workflows: Users can orchestrate multi-step workflows, such as compressing files, transcoding videos, or transforming data formats, based on specific object events.

Integrating with services: The feature offers flexible integration capabilities, enabling seamless interaction with various services and tools to enhance data processing and management.

Monitoring changes: Users can efficiently monitor and track changes to stored objects, ensuring timely responses to evolving data requirements and faster security response to safeguard critical assets.

What Are Some of the Key Capabilities of Backblaze B2 Event Notifications?

Flexible Implementation: Event Notifications are sent as HTTP POST requests to the desired service or endpoint within your infrastructure or any other cloud service. This flexibility ensures seamless integration with your existing workflows. For instance, your endpoint could be Fastly Compute, AWS Lambda, Azure Functions, or Google Cloud Functions, etc.

Event Categories: Specify the types of events you want to be notified about, such as when files are uploaded and deleted. This allows you to receive notifications tailored to your specific needs. For instance, you have the flexibility to specify different methods of object creation, such as copying, uploading, or multipart replication, to trigger event notifications. You can also manage Event Notification rules through UI or API.

Filter by Prefix: Define prefixes to filter events, enabling you to narrow down notifications to specific sets of objects or directories within your storage on Backblaze B2. For instance, if your bucket contains audio, video, and text files organized into separate prefixes, you can specify the prefix for audio files to receive event notifications exclusively for audio files.

Custom Headers: Include personalized HTTP headers in your event notifications to provide additional authentication or contextual information when communicating with your target endpoint. For example, you can use these headers to add necessary authentication tokens or API keys for your target endpoint, or include any extra metadata related to the payload to offer contextual information to your webhook endpoint, and more.

Signed Notification Messages: You can configure outgoing messages to be signed by the Event Notifications service, allowing you to validate signatures and verify that each message was generated by Backblaze B2 and not tampered with in transit.

Test Rule Functionality: Validate the functionality of your target endpoint by testing event notifications before deploying them into action. This allows you to ensure that your integration with your target endpoint is set up correctly and functioning as expected.

Want to Learn More About Event Notifications?

Join our waiting list by signing up here.
View our on demand webinar.
For Quick-Start, User Guide, and API documentation, head over to the Event Notification section of our documentation.

Event Notifications represents a significant advancement in data management and automation for Backblaze B2 users. By providing a flexible and powerful capability for orchestrating data processing workflows, Backblaze continues to empower businesses to unlock the full potential of their data with ease and efficiency.

The post Automate Your Data Workflows with Backblaze B2 Event Notifications appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

From Chaos to Clarity: 6 Best Practices for Organizing Big Data

2023-05-09 Bala Krishna Gangisetty

Post Syndicated from Bala Krishna Gangisetty original https://www.backblaze.com/blog/from-chaos-to-clarity-6-best-practices-for-organizing-big-data/

There’s no doubt we’re living in the era of big data. And, as the amount of data we generate grows exponentially, organizing it becomes all the more challenging. If you don’t organize the data well, especially if it resides in cloud storage, it becomes complex to track, manage, and process.

That’s why I’m sharing six strategies you can use to efficiently organize big data in the cloud so things don’t spiral out of control. You can consider how to organize data from different angles, including within a bucket, at the bucket level, and so on. In this article, I’ll primarily focus on how you can efficiently organize data on Backblaze B2 Cloud Storage within a bucket. With the strategies described here, you can consider what information you need about each object you store and how to logically structure an object or file name, which should hopefully equip you to better organize your data.

Before we delve into the topic, let me give a super quick primer on some basics of object storage. Feel free to skip this section if you’re familiar.

First: A Word About Object Storage

Unlike traditional file systems, when you’re using object storage, you have a simple, flat structure with buckets and objects to store your data. It’s designed as a key-value store so that it can scale to the internet.

There are no real folders in the object store file system. The impact of this is data is not separated into a hierarchical structure. That said, there are times that you actually want to limit what you’re querying. In that instance, prefixes provide a folder-like look and feel, which means that you can get all the benefits of having a folder without any major drawbacks. From here onwards, I’ll generally refer to folders as prefixes and files as objects.

With all that out of the way, let’s dive into the ways you can efficiently organize your data within a bucket. You probably don’t have to employ all these guidelines. Rather, you can pick and choose what best fits your requirements.

1. Standardize Object Naming Conventions

Naming conventions, simply put, are rules about what you and others within your organization name your files. For example, you might decide it’s important that the file name describes the type of file, the date created, and the subject. You can combine that information in different ways and even format pieces of information differently. For example, one employee may think it makes more sense to call a file Blog Post_Object Storage_May 6, 2023, while another might think it makes sense to call that same file Object Storage.Blog Post.05062023.

These decisions do have impact. For instance that second date format would confuse the majority of the world who uses the day/month/year format, as opposed to month/day/year as is common in the United States. . And, what if you take a different kind of object as your example, one that versioning becomes important for? When do code fixes for version 1.1.3 actually become version 1.2.0?

Simply put, having a consistent and well thought out naming convention for your objects makes life easy when it comes to organizing data. You can and should derive and follow a pattern while naming the objects. Based on your requirements, a consistent and well thought out pattern for naming your objects makes it easy to find and sort files.

2. Harness The Power of Prefixes

Prefixes provide a folder-like look and feel on object stores (as there are no real folders). The prefixes are powerful and immensely helpful while effectively organizing your data and allow you to make good use of the wildcard function in your command line interface (CLI). A good way to think about a prefix is that it creates hierarchical categories in your object name. So, if you were creating a prefix about locations and using slashes as a delimiter, you’d create something like this:

North America/Canada/British Columbia/Vancouver

Let’s imagine a scenario where you generate multiple objects per day, you can structure your data per year per month and per day. An example prefix would be year=2022/month=12/day=17/ for the multiple objects generated on December 17, 2022. If you queried for all objects created on that day, you might get results that look like this:

2022/12/17/Object001
2022/12/17/Object002
2022/12/17/Object003

On the Backblaze B2 secure web application, you will notice these prefixes create “folders” three levels deep, year=2022, month=12 and day=17. The folder, day=17, will contain all the objects with the example prefix in their names. Partitioning data is helpful to easily track your data. It is also helpful in the processing workflows that use your data after storing it on Backblaze B2.

3. Programatically Separate Data

After ingesting data into B2 Cloud Storage, you may have multiple workflows to make use of data. These workflows are often tied to specific environments and in turn generate more new data. Production, staging, and test are some examples of environments.

We recommend keeping the copy of raw data and the new data generated by a specific environment separate. This lets you keep track of when and how changes were made to your datasets, which in turn means you can roll back to a native state if you need to or replicate the change if it’s producing the results you want. In occasions of undesirable events like a bug in your processing workflow, you can rerun the workflow with a fix in place on the raw copy of data. To illustrate the data specific to the production environment, an example would be /data/env=prod/type=raw, and /data/env=prod/type=new.

4. Leverage Lifecycle Rules

While your data volume is ever increasing, we recommend reviewing and cleaning up unwanted data from time to time. Doing that process manually is very cumbersome, especially when you have large amounts of data. Never fear: Lifecycle rules to the rescue. You can set up lifecycle rules to automatically hide or delete data based on a certain criteria which you can configure on Backblaze B2.

For example, some workflows create temporary objects during processing. It’s useful to briefly retain these temporary objects to diagnose issues, but they have no long-term value. A lifecycle rule could specify that objects with the /tmp prefix are to be deleted two days after they are created.

5. Enable Object Lock

Object Lock makes your data immutable for a specified period of time. Once you set that period of time, even the data owner can’t modify or delete the data. This helps to prevent an accidental overwrite of your data, creates trusted backups, and so on.

Let’s imagine a scenario where you upload data to B2 Cloud Storage and run a workflow to process the data which in turn generates new data, and use our production, staging, and test example again. Due to a bug, your workflow tries to overwrite your raw data. When you have Object Lock set, the rewrite won’t happen, and your workflow will likely error out.

6. Customize Access With Application Keys

There are two types of application keys on B2 Cloud Storage:

Your master application key. This is the first key you have access to and is available on the web application. This key has all capabilities, access to all buckets, and has no file prefix restrictions or expiration. You only have one master application key—if you generate a new one, your old one becomes invalid.
Non-master application key(s). This is every other application key. They can be limited to a bucket, or even files within that bucket using prefixes, can set read-only, read-write, or write-only access, and can expire.

That second type of key is the important one here. Using application keys, you can grant or restrict access to data programmatically. You can make as many application keys in Backblaze B2 as you need (the current limit is 100 million). In short: you can get detailed in customizing access control.

In any organization, it’s always best practice to only grant users and applications as much access as they need, also known as the principle of least privilege. That rule of thumb reduces risk in security situations (of course), but it also reduces the possibility for errors. Extend this logic to our accidental overwrite scenario above: if you only grant access to those who need to (or know how to) use your original dataset, you’re reducing the risk of data being deleted or modified inappropriately.

Conversely, you may be in a situation where you want to grant lots of people access, such as when you’re creating a cell phone app, and you want your customers to review it (read-only access). Or, you may want to create an application key that only allows someone to upload data, not modify existing data (write-only access), which is useful for things like log files.

And, importantly, this type of application key can be set to expire, which means that you will need to actively re-grant access to people. Making granting access your default (as opposed to taking away access) means that you’re forced to review and validate who has access to what at regular intervals, which in turn means you’re less likely to have legacy stakeholders with inappropriate access to your data.

Two great places to start here are restricting the access to specific data by tying application keys to buckets and prefixes and restricting the read and write permissions of your data. You should think carefully before creating an account-wide application key, as it will have access to all of your buckets, including those that you create in the future. Restrict each application key to a single bucket wherever possible.

What’s Next?

Organizing large volumes by putting some guidelines into practice can make it easy to store your data. Pick and choose the ones that best fit your requirements and needs. So far, we have talked about organizing the data within a bucket, and, in the future, I’ll provide some guidance about organizing buckets on B2 Cloud Storage.

The post From Chaos to Clarity: 6 Best Practices for Organizing Big Data appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Noise

All posts by Bala Krishna Gangisetty

The collective thoughts of the interwebz

Why S3 compatible lifecycle rules matter

API surface area and XML structure

Example put lifecycle request

Example get lifecycle response

Mapping from S3 rules to B2 rules

Expiration behavior

How this behaves internally

Expiring the current version

Overlapping and nested prefix rules

Multipart upload cleanup

Internal execution model

Lifecycle execution model

Error handling and validation

Compatibility considerations

Final thoughts

New survey: The Hidden Cost of Cloud Storage

What’s new

Why this matters

1. Seamless migrations from AWS

2. More flexibility for complex data structures

3. Increased functionality for multi-cloud architectures

4. Cost optimization

How it works

Example use cases

How it complements existing B2 Lifecycle Rules

Getting started

Availability

Wrapping up

What you can track with Bucket Access Logs

Easy to configure: User interface (UI) and APIs

Preview: Configuring Bucket Access Logs in the UI

Use cases for Bucket Access Logs

1. Security and privacy monitoring

2. Infrastructure and traffic control

3. Usage tracking and audit trails

Get started with Bucket Access Logs

The challenge: Unfinished large file uploads

The solution: Canceling unfinished uploads through lifecycle rules

Why it matters

Cost-saving benefits

What’s next?

What are Custom Upload Timestamps?

Benefits of Custom Upload Timestamps

Ready to get started?

What next?

Jump to the docs

The challenges

The solution: Scalable Application Keys

Real-world applications

Benefits of Scalable Application Keys

Ready to get started?

What next?

Key capabilities

Versatile use cases

Automated media processing

Backup monitoring

Presigned URL monitoring

Security and access control

Integration with external systems

Analytics and reporting

Ready to get started?

What’s next?

User experience enhancements

1. A new nested command structure

2. Streamlined ls and rm commands

3. Enhanced credential management

4. Transition to kebab-case flags

5. Simplified listing with ls

6. Handy aliases for common flags

7. Global quiet mode

8. Autocomplete

New features to boost your productivity

Start using the new features today

Public Preview Update: July 31, 2024

A Deeper Dive on Backblaze’s Event Notifications Service

Public Preview Update: July 31, 2024

What Is Backblaze B2 Event Notifications Good For?

What Are Some of the Key Capabilities of Backblaze B2 Event Notifications?

Want to Learn More About Event Notifications?

First: A Word About Object Storage

2. Streamlined `ls` and `rm` commands

5. Simplified listing with `ls`