Tag Archives: Featured

The Truth About Cloud Security Costs: Why High Costs Don’t Always Mean Better Protection

2025-09-16 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/the-truth-about-cloud-security-costs-why-high-costs-dont-always-mean-better-protection/

A decorative image showing a shield and gears.

When evaluating cloud providers, cost is often the most visible factor—but in enterprise IT, information security (InfoSec), and compliance, security is always the first (and likely most important) concern. As a technology leader, you know that determining “acceptable” risk is a moving target, but you’re likely also regularly squeezed by budget pressures and a mandate to contribute to the company’s bottom line.

Taking a chance on providers with lower price tags might feel like too big of a risk—lower-cost providers must be sacrificing something, and all too often, that something is security. Right?

It’s a fair question, but the answer might surprise you. Today, we’re talking about how specialized cloud providers provide surprising value—and even provide security benefits—when compared with traditional, hyperscaler architectures. Let’s talk about what you need to know to evaluate a cloud provider’s security posture.

Want to hear from the experts?

Join our upcoming session to hear from Backblaze experts Troy Liljedahl, Sr. Director, Solutions Engineering, and Pat Patterson, Chief Technical Evangelist, about the knowledge and features you need to stay ahead of modern threats.

Join us to learn:

Foundational controls: Master the best practices for using encryption, Object Lock, access keys, role-based access controls, and more to build a solid defense.
Advanced threat detection: Get an exclusive look at Backblaze’s new feature, Anomaly Alerts, which helps detect irregular and potentially suspicious data access patterns.
A unified approach: Understand how to integrate these powerful features to create a strong, easy-to-manage security strategy.

How specialized cloud providers provide security benefits

In theory, cloud architecture encourages redundancy. But in practice, many companies—even those using multi-cloud strategies—tend to consolidate key services like authentication and orchestration with a single vendor. When that vendor’s services go down, it doesn’t matter that your data is replicated across three availability zones in the same data center. If you can’t log in to access it, your redundancy becomes purely theoretical. This year alone, there have been major outages that had widespread consequences from the likes of Google, IBM cloud, and others.

Specialized cloud providers and multi-cloud strategies provide inherent benefits here.

Vendor transparency: Open cloud providers publish clear, detailed practices around architecture, encryption, and compliance rather than burying them behind opaque marketing claims. This transparency allows your teams to independently validate security assurances.
Avoiding lock-in: Multi-cloud strategies ensure you’re not beholden to a single vendor’s security practices. If one provider falls short, data replication and redundancy across platforms can maintain both compliance and resilience.
Risk distribution: By spreading workloads across providers, organizations mitigate the risk of a single point of failure, outage, or vendor breach.
Compliance flexibility: Different providers may align more strongly with specific frameworks (SOC 2, HIPAA, GDPR, etc.), giving enterprises options to meet evolving regulatory demands.

This means that organizations don’t have to choose between cost efficiency and security—they can and should get both.

How to evaluate a cloud provider’s security posture

Choosing the right cloud provider isn’t just about price, features, or performance—it’s about knowing they can safeguard your data and prove it. Here are key areas to assess:

Architecture & physical security
- Does the provider operate its own infrastructure or rely on generic colocation facilities?
- What physical safeguards (biometrics, restricted access, surveillance) protect the data centers?
Encryption & data protection
- Is data encrypted both in transit (TLS/SSL) and at rest (AES-256 or equivalent)?
- Are key management options available, including customer-managed keys?
- Is immutability (Object Lock or write once, read many (WORM) storage) supported for ransomware defense?
Access & identity controls
- Are granular permissioning and role-based access (RBAC) controls available?
- Does the provider support single sign on (SSO), multi-factor authentication (MFA), and integration with enterprise identity systems?
- Can admins maintain clear audit logs of all access and changes?
Compliance & certifications
- Which third-party attestations does the provider maintain (SOC 2, HIPAA, PCI-DSS, GDPR, ISO)?
- Can they provide signed agreements (such as Business Associate Agreements (BAAs)) as needed for regulated industries?
Resilience & multi-cloud strategy
- Do they offer replication across regions or the ability to integrate into a multi-cloud strategy?
- How quickly can you move workloads or data out if you need to change vendors or access data in case of emergency?

By using this evaluation framework, IT leaders can look past marketing promises and price tags, focusing on verifiable controls and independent certifications.

The hyperscaler tax for cloud security

Many enterprises assume that higher cloud storage costs from hyperscalers like AWS, Azure, or Google Cloud translate directly into better security. In reality, much of that premium is a “hyperscaler tax” driven by complex business models, bundled services, and legacy infrastructure—not inherently superior protection. Specialized cloud providers can often deliver the same enterprise-grade security controls—encryption, compliance certifications, access management—without the inflated price tag, proving that security and affordability are not mutually exclusive.

Building a better mousetrap: The innovation behind Backblaze B2

From the beginning, Backblaze has architected its storage solution to be both performant and cost-effective. And, by specializing in storage (as opposed to the myriad of solutions offered by, say, Amazon Web Services and other hyperscalers), Backblaze is able to optimize for the economics of storage and storage alone.

To help you get past the price tag and into the technical details, let’s break down the pillars of Backblaze B2 security and compliance.

Compliance? We’ve got a visual for that.

Want a quick glance on how Backblaze compares to other cloud storage providers on key security and compliance elements? Check out our comparison matrices.

Architecture and physical security: The foundation of trust

Our security starts with our physical infrastructure. Our data centers are designed for 11 nines of data durability and are staffed 24/7/365. They feature:

Best-in-class security features: Biometric security, ID checks, and multi-layered access controls.
A purpose-built infrastructure: From Backblaze Storage Pods to projects like Shard Stash and ongoing feature releases, the Backblaze platform is designed for maximum data durability and security.

This physical and architectural security is the bedrock of our service, and it’s backed by industry-standard certifications like SOC 2 Type 2 certification.

Data storage security: Protecting data at rest and in transit

Data security is a core tenet of our platform. From the moment your data leaves your system until it is stored on our pods, it is protected by multiple layers of encryption.

Encryption in transit: All files are transmitted to Backblaze B2 using an encrypted TLS connection.
Encryption at rest: Your data is encrypted before it is stored on disk. We offer two options for server-side encryption with 256-bit Advanced Encryption Standard (AES-256):
- Server-side encryption (SSE) with Backblaze managed keys (SSE-B2): We handle the key management for you, providing seamless, built-in protection.
- SSE with customer managed keys (SSE-C): For organizations with strict compliance requirements, you can manage your own keys, giving you complete control over your data’s access.
Object Lock for immutability: Our Object Lock feature provides a powerful layer of ransomware protection. Using a write-once, read-many (WORM) model, it prevents files from being modified, manipulated, or deleted for a customer-determined retention period. This is an essential tool for compliance and disaster recovery.
Cloud Replication: For businesses with high-availability or geographical redundancy requirements, Backblaze B2 supports automatic replication of data across different regions, ensuring your data is always available and safe from regional outages or other incidents.

Access management security: Granting control and ensuring accountability

Controlling who can access your data is paramount. We provide granular, enterprise-grade access management controls that give you full command over your storage:

Fine-grained API key control: Create and manage accounts, groups, and specific data access permissions with robust API key control.
Multi-factor authentication (MFA) & single sign-on (SSO): We offer multiple account authentication options, including MFA and SSO via providers like Google Workspace and Office 365, to prevent unauthorized access.
Comprehensive logging: Backblaze provides detailed logs and reports on all activities within your account, so you can maintain a clear audit trail.

Compliance: Demonstrating our commitment to best practices

Security is not just a feature; it’s a commitment that’s verified by independent third parties. Backblaze has achieved a number of security and compliance attestations, including:

SOC 2, Type 2: We have been independently audited and certified for SOC 2, Type 2 compliance, demonstrating our commitment to protecting customer data.
HIPAA: For business customers who are Covered Entities under the Health Insurance Portability and Accountability Act (HIPAA), we can provide a Business Associate Agreement (BAA) upon request.
PCI-DSS: Backblaze’s adherence to the Payment Card Industry Data Security Standard (PCI-DSS) is supported by our use of Stripe to handle card information and our internal security controls.
GDPR: We adhere to General Data Protection Regulation (GDPR) privacy policies and provide Data Processing Agreement Addendums (DPAs) for EEA/EU and UK residents.

While some competitors may also offer these certifications, Backblaze’s pricing model is built to ensure you don’t have to pay a premium for them. Our efficiencies mean that we can pass the savings directly to you without compromising on the security and compliance that your business demands.

Specialized cloud storage: Enabling enterprises to evaluate their best options

In the end, our goal is to free you from the false choice between security and affordability. The reality is that the high cost of some cloud providers is a result of their complex, multi-tiered business models—not a reflection of superior security. Backblaze’s commitment to building a focused, innovative, and transparent cloud storage solution allows us to deliver on our promise: enterprise-grade security and compliance, at a fraction of the cost.

The post The Truth About Cloud Security Costs: Why High Costs Don’t Always Mean Better Protection appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Archiving: The Steady Driver of Media & Entertainment Storage

2025-09-11 Laquie TN Campbell

Post Syndicated from Laquie TN Campbell original https://www.backblaze.com/blog/archiving-the-steady-driver-of-media-entertainment-storage/

A decorative image showing media icons on a gradient background.

Industry research consistently shows that archiving and preservation remain the largest drivers of storage demand in media and entertainment workflows. As of 2024, archiving—including both new and historical assets—accounts for the majority of digital storage capacity.

This reflects an ongoing reality: Every production adds new terabytes of content, and studios/vendors are responsible for keeping it safe and usable long-term. And, for many M&E teams, traditional LTO tape libraries represent a cumbersome way to manage vast (and growing) archives.

Increasingly, the decision is less whether to adopt cloud, and more how to use it responsibly. For some, that means hybrid systems that balance performance and scalability. For others, particularly smaller studios, cloud may become the backbone of both active and deep archives.

Free ebook: Why Media Workflows Are Embracing Cloud Storage—On Their Own Terms

Cloud media workflows are quick to promise a solution, but it’s more important for you to learn how to navigate. Read our ebook on how to use the cloud to best serve you and your team.

Why archiving dominates conversations—and what makes it so complex

Long-term preservation isn’t a “set-and-forget” task—technology evolves, file formats age, and migration becomes as essential as storage itself. Meanwhile, the shift to 4K, 8K, HDR, and immersive content means productions routinely generate petabytes of material.

The challenge isn’t just volume, but ensuring ongoing integrity, accessibility, and migration compatibility over decades. That makes both current trends in file creation and future-proofing archives an active task.

While the challenge can look different to different types of M&E teams, there are benefits for all of them:

For editors and post-production teams, active archives keep high-resolution footage readily available, eliminating the frustration of digging through cold storage when a quick edit or repurposing request comes in.
For media asset managers, they transform archives into searchable, metadata-rich repositories that reduce retrieval time and prevent costly duplication of content.
For executives and producers, active archives protect past investments by making legacy assets easily accessible for remonetization in new markets, remasters, and marketing campaigns.
For IT and workflow engineers, they provide automated tiering and integration across on-prem and cloud systems, ensuring scalable performance without ballooning infrastructure costs.

Still, a recent NAB survey showed that archive capacity remains a challenge for 85% of respondents. Searchability is another weak spot, with some teams still relying on spreadsheets.

Why cloud adoption has been cautious

Although cloud storage offers flexibility, adoption in the M&E industry has been measured. Common concerns include:

High egress costs: Many providers charge significant fees for retrieving archived data. For media workflows that often involve moving large files in and out of storage, these costs can add up quickly.

Performance concerns: Latency and bandwidth limitations can disrupt workflows, especially in post-production environments that rely on fast access to high-resolution files.
Unpredictable workflows: Unlike enterprise archives where files may be rarely accessed, media archives are often “active.” Teams may suddenly need terabytes of content for a remastering project or marketing campaign. Cloud pricing models built around cold storage don’t always align well with this reality.
Trust and security: Especially in the early years of cloud, concerns around data sovereignty, intellectual property protection, and compliance slowed adoption. While cloud providers have strengthened their credentials in these areas, trust remains a consideration.
Established investments in on-prem: Many organizations already have significant capital invested in tape libraries, network attached storage (NAS), storage area network (SAN) systems, or colocation setups, making the shift to cloud a long-term transition rather than an overnight change.

How the cloud can help

The shift from “traditional” production to newer technologies in content filming and creation—including both hardware and software tools—leaves many M&E teams with several, competing demands for their tech stacks. Cloud workflows can offer significant benefits for scalability, searchability, and budget management.

Elastic capacity

Cloud removes the need for large upfront capital investments and scales as archives grow. For organizations with fluctuating storage needs, this flexibility is particularly valuable.

Cost-tiering options

Cloud services now offer multiple archival tiers—from “hot” to “deep archive”—allowing teams to balance cost with access needs. Combined with lifecycle management policies, this helps align budgets with actual usage.

Hybrid approaches

The most common strategy today is hybrid: keeping frequently accessed assets on-premises or in private cloud, while offloading less active content to public cloud. Surveys show hybrid adoption has grown significantly in the last five years, with expectations that it will continue to rise.

Collaboration and accessibility

For global teams, cloud improves accessibility. Editors, producers, and marketing teams in different regions can access the same archival assets without relying on physical transfers, VPNs, or duplicated storage.

AI-enabled metadata

Cloud platforms also support AI and ML services that enrich metadata. This transforms archives from passive repositories into searchable, discoverable libraries—unlocking new value from existing content.

The future of media archiving is in the cloud

The move to cloud is gradual, shaped by cost, performance, and workflow realities. Yet the volume and importance of archives—and cloud-based workflows—continue to grow. When paired with thoughtful strategies, cloud storage offers a flexible way to manage that growth while unlocking new creative value.

By designing storage approaches that balance innovation with practicality, M&E teams can ensure archives remain accessible, secure, and ready to support the next generation of storytelling.

The post Archiving: The Steady Driver of Media & Entertainment Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing an Interactive Code Review Experience with Amazon Q Developer in GitHub

2025-09-09 Sundaresh Iyer

Post Syndicated from Sundaresh Iyer original https://aws.amazon.com/blogs/devops/introducing-an-interactive-code-review-experience-with-amazon-q-developer-in-github/

Code reviews are one of the most valuable rituals in software development. They help ensure quality, maintain consistency, and foster growth as engineers. But they’re also one of the most time consuming steps in the software development lifecycle. A common pattern I’ve seen is a developer opening a pull request (PR), receiving automated or peer comments, and then needing to search through documentation, Slack threads, or past code just to understand why a change was suggested. That search for missing context creates a friction that slows teams down, adds back-and-forth cycles, and often distracts from the bigger picture of building great products.

In the initial preview experience, teams used Amazon Q Developer in GitHub across issues and PRs for feature work, automated code reviews, and common modernization tasks. This kept work inside GitHub and reduced handoffs. Automatic reviews on new or reopened PRs surfaced findings early, but teams still wanted more context and a tighter loop inside the PR.

Today we’re introducing an interactive code review experience for PRs You can ask Amazon Q Developer questions about any finding using /q, see a concise summary with threaded findings, and apply suggested changes without leaving GitHub. Code reviews by Amazon Q Developer now complete quicker than before, which reduces wait time and shortens the review cycle so teams can merge sooner and spend more time building.

What’s new and why it matters

Interactive Conversations in the pull request: Comment with /q to get inline answers, or ask Q Developer to propose a code change you can apply in the PR. For example:/q explain this finding or /q propose a change that replaces class toggles with a data attribute for state.
Code review summaries with threaded findings: Each code review begins with a concise summary and findings are threaded underneath. This makes updates easier to follow and reduces noise.
Faster execution with clearer notifications: Amazon Q Developer completes its analysis quicker and notifications are organized and easier to scan. This reduces wait time and shortens the review cycle.
When you create or open a new PR, Amazon Q Developer automatically starts a code review if the code review feature is enabled for your GitHub installation in the Amazon Q Developer console. Subsequent commits do not trigger another automatic review. To run a fresh analysis, post /q review as a new comment on the PR.

Getting Started with Amazon Q Developer in GitHub

To get started, install the Amazon Q Developer GitHub App in your GitHub organization or repository. The app is available through the GitHub Marketplace and can be used without an AWS account during the preview. During installation, you choose whether to provide access to all repositories or only selected repositories in your GitHub organization. You can increase free usage by registering the app installation in the Amazon Q Developer console.
For more details on installation, permissions, and configuration options, see the Amazon Q Developer for GitHub documentation. Once the app is installed, you can begin using Q Developer to review PRs automatically.

Using Amazon Q Developer in Pull Requests

To dive deeper, here’s an end-to-end walk-through of the new interactive code review experience using a simple card game I built with Amazon Q Developer.

Create a new pull request : In this example, I started by creating a feature branch and named it demo, added atailwind.css file to the JavaScript and HTML card game app, pushed the branch, and opened a PR for review.
Amazon Q Developer automatically starts a code review, analyzing code quality, potential issues, and adherence to best practices. A concise summary appeared at the top, with individual findings threaded underneath. This gave me the big picture and the specifics in one place.
Code review the summary and findings: I reviewed the summary and threaded findings to decide which change to take on first. Seeing both the rationale and the exact lines called out meant I knew where to begin, without hunting through files.
Ask for Clarification with /q : One of the findings suggested using state property to track the card status in my card game application. so I asked Q Developer for clarification. It responded quickly with concrete context and pointers, which reduced back and forth and improved the quality of the review.
Continue the conversation (if needed) : I reviewed Q Developer’s suggestion and responded back stating that I preferred an alternate approach and Q Developer quickly returned a complete implementation I could apply in the PR
Apply Fixes : After reviewing the implementation suggestion, I clicked on Commit suggestion to create a new commit on the PR branch with my username as the author.
Re-run the review : I didn’t need this for my example, but if you push additional changes, you can run a fresh analysis by posting /q review as a new top-level comment. Q Developer will run the review and post updated findings.

With the code review complete and checks passing, I merged. The new interactive code review experience reduced wait time and review cycles and made the “why” behind each finding and suggested change clear.

Conclusion

Amazon Q Developer for GitHub is available today in preview. Whether you are an individual developer or part of a large engineering team, this update helps you ship cleaner code with fewer cycles and makes code reviews something to look forward to rather than avoid.
Try it out on your next PR. Type /q, ask a question, and see how smarter conversational reviews transform your workflow.

Stack to Win: A Powerful Solution for Sports Media Production

2025-09-04 Dave Simon

Post Syndicated from Dave Simon original https://www.backblaze.com/blog/stack-to-win-a-powerful-solution-for-sports-media-production/

A decorative image showing the text Stack to Win with Boomer Esiason. In the background, the logos for Backblaze, Suite Studios, and Iconik are displayed on media screens.

I recently joined an incredible group of thought leaders for a panel discussion on the future of sports media. Hosted by sports commentator and former NFL MVP Boomer Esiason, our Stack to Win panel featured Jeremy Strootman from Iconik, Jay Maxwell from Suite Studios, the NFL’s VP of Broadcasting Mike North, and me—Dave Simon from Backblaze. Together, we explored the complexities of modern sports content creation and how our integrated cloud-native solutions from Backblaze B2, Iconik, and Suite offer a powerful blueprint for radically streamlining workflows and unlocking new opportunities for efficiency, speed, and monetization.

The traditional, linear model of sports media production is a thing of the past. It’s been completely changed by new technology and a shift in what fans expect. Today, media teams are in a real-time battle for attention against every other form of entertainment. This new world demands a different kind of setup, one that’s built for the cloud and designed to handle the entire media lifecycle. The solution we’ve built, a powerful combination of Backblaze, Iconik, and Suite Studios, is exactly that. It’s the playbook for staying ahead.

Watch the full interview

There’s so much more that we could summarize in just one blog post. Check out the full video below:

The (data) problem

Game day content is immense—we’re talking 6–7TB of data nightly. In the past, this was a logistical nightmare. As Jeremy Strootman from Iconik pointed out, “It used to be we’d get a hard drive and I’d get a hard drive, and we made sure that we just took different flights on the way home. It was literally that archaic.” When speed is everything, old methods like shipping hard drives are a huge liability.

This pressure comes from fans who have an insatiable appetite for content across every platform imaginable. They expect teams to produce their own content in real-time for streaming and social media. For many, the “second screen” is now the main screen, with 73% of fans using mobile apps for real-time updates during live events. If your workflow is slow, you’ve already lost the competition.

The definition of sports content has also expanded. It’s no longer just about the game itself, but also the stories around it—the players’ lives and the team’s entire ecosystem. Jay Maxwell of Suite Studios captured this perfectly:

The product is not just what’s on the field anymore. It’s also what’s going on in these, you know, athletes lives, what’s going on in the peripheries of the team and the organizations.
—Jay Maxwell, Co-Founder and Chief Product Officer, Suite Studios

This includes pop culture crossovers, fantasy sports, and in-game betting, all of which demand instant video highlights.

A great example of this is when Eagles wide receiver AJ Brown was spotted reading a book called “Inner Excellence” on the sidelines. The moment went viral, and the book, which was previously ranked 585,000 on the bestseller list, vaulted to number one instantly. As the NFL’s Mike North noted, this is how fans can instantly “go deeper” and connect with their favorite players. The ability to capture and distribute these moments instantly is a fundamental requirement for success.

A modern technology stack

An integrated, cloud-native tech stack provides a seamless workflow that removes risk and speeds up the content pipeline. It’s a powerful combination of three key layers:

1. Foundation: The active cloud archive

Modern media workflows are built on a cloud storage foundation that replaces old systems like tape libraries and shelves full of hard drives. The key is an active cloud archive that gives you instant access to your footage. This eliminates the costly delays of older solutions and offers predictable costs, so you never get hit with surprise fees when you need to access your own content.

2. Intelligence: Media Asset Management (MAM)

This is the smart layer that makes your vast archive searchable and valuable. Instead of producers manually sifting through hours of footage, a multimodal AI search engine can find the exact clip they need in seconds. As Dave Simon explained, you can use a natural language search to describe exactly what you’re looking for, such as “Jerry Rice catching a ball over his left shoulder wearing a white jersey”. AI tools in a media stack can automatically transcribe interviews, search for specific quotes, and even identify abstract concepts like emotion or reframe a video for different social media platforms.

3. Accelerator: Real-time cloud editing

This component handles the final stage of production, allowing editors to access high-resolution media without a download delay. This technology streams data directly from the cloud, so editors can start working immediately. This is how a remote team can instantly cut and create content from footage uploaded on the field.

The real magic is all of these elements combined: A clip is only useful if an editor can work with it right away, and a huge archive is only valuable if you can find what’s in it. This is a single, cohesive system that manages the entire media lifecycle from start to finish.

Reshaping the business of sports

Adopting a modern tech stack empowers rights holders—leagues, teams, and athletes—to manage and distribute content on a massive scale. They can bypass traditional media gatekeepers and build direct relationships with their fans. This opens up several possibilities, such as:

Archive monetization. Vast archives, once a simple cost center, have now become a major source of revenue. With an accessible, intelligent archive, organizations can unlock new revenue streams.

Licensing storefronts: You can create B2B portals for broadcasters and filmmakers to license and download footage, which essentially creates a self-service revenue engine.
Direct-to-consumer (DTC) fan platforms: Launch your own subscription services with exclusive access to historical games and behind-the-scenes content.
Free Ad-supported Streaming TV (FAST) channels: Program and launch FAST channels using repurposed archival content.
Creator economy partnerships: License parts of your archive to creators to reach new audiences and share in the revenue.
Enable the athlete as a media entity. This same technology is behind the rise of athletes as media producers. Today’s players are actively shaping their own stories and building media businesses. The low barrier to entry for these cloud workflows is the foundation of this movement, giving athletes the same scalable tools once reserved for major networks. A great example of this is Peyton Manning’s Omaha Productions, which started as a player-led media company and became a leader in the space.

The future fan experience

This revolution is transforming the fan experience from a one-way broadcast to something personal, interactive, and instant. The future of sports consumption is personalized feeds tailored to individual interests. As Mike North noted, “You don’t really need to watch the game anymore to still be a fan.” For a fan who wants to know everything about a player, a custom feed can be created. For a fantasy football enthusiast, clips and highlights related to their team can be pushed to them in real time.

The experience will also be interactive. Streaming platforms are already using augmented reality (AR) overlays and multi-angle camera views. The next step, powered by AI and accessible archives, is allowing fans to directly ask for content, like, “‘Show me all the Hail Mary plays from this season?’” and instantly get a custom playlist. This shifts passive viewing into active exploration.

For any sports organization, the biggest risk is standing still and maintaining the status quo. As Jay Maxwell put it, “The barrier to entry to try is, you know, cheap if not free.” An integrated, cloud-native workflow isn’t just a competitive advantage—it’s the fundamental requirement for survival and success.

Check out the full solution below:

The post Stack to Win: A Powerful Solution for Sports Media Production appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Cloud Storage Myths Debunked, Part Four: Managing Multiple Clouds Is Too Complicated

2025-09-02 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/cloud-storage-myths-debunked-part-four-managing-multiple-clouds-is-too-complicated/

A decorative image showing multiple types of storage media.

Today’s myth feels familiar, because it’s exactly what major cloud providers want you to believe:

Multi-cloud? Sounds like a recipe for chaos. Best stick with one and avoid the hassle.

On the surface, it sounds reasonable. The “big three” clouds promote their all-in-one ecosystems as seamless and unified. One provider, one bill, one console. That should make life easier, right?

Not so fast. The promise of simplicity often hides a different reality: deep complexity, rigid architecture, and vendor lock-in. For many cloud-native teams, what’s pitched as convenience ends up costing time, money, and agility.

This is the final post in our blog series debunking persistent myths about cloud storage. (You can read the first, second, and third articles in the series to get up-to-date.) And, if you’ve ever been told that multi-cloud is messy or risky, this one’s for you.

New Cloud Native Times Call for New Cloud Storage Approaches

Learn more about how the open cloud supports faster development, improved workflows, and reduced cost complexity in our free ebook, “New Cloud Native Times Call for New Cloud Storage Approaches.”

Integrated ≠ simple

The idea that one provider equals less overhead is seductive. But in practice, integration can mean entanglement. Instead of reducing operational drag, it creates a tightly woven web of interdependent services and proprietary systems, which makes changes slow and expensive.

The all-in-one trap

Big cloud providers’ platforms are sprawling by design. They’re built to meet every need under one roof. However, the supposed benefit of simplicity falls apart once you actually start using that provider’s full range of services.

When it comes to storage alone, teams must navigate:

Multi-tier systems (hot, cool, cold) with different costs, speeds, and access methods, each requiring its own performance and cost configuration.
Complex lifecycle policies and scripts to automate data movement between tiers.

Intricate IAM setups to manage roles, policies, and permissions across services.

Proprietary APIs and tooling that create migration headaches and limit portability.
Console UIs and behaviors that change depending on region or storage class, adding to the learning curve.

Deep interdependencies across services, where a small change in storage can ripple through compute, networking, and security layers.

What starts as a unified experience quickly becomes a maze of shifting rules and unstable configurations. And the more deeply your architecture relies on these moving parts, the more frustrating your operations become.

Frustration, not flexibility

Not only is it tedious to manage all of this, but it can be risky. Miss a lifecycle rule, and you might incur unexpected fees. Misconfigure access policies, and you could lose visibility into or even access to your own data.

These aren’t edge cases. They’re everyday realities in cloud-native environments where time is tight, systems are complex, and DevOps teams are stretched thin. Here’s what that might look like in practice:

A developer spins up a test environment without realizing data is landing in a high-cost tier.
An SRE responds to a latency issue, only to discover the data lives in cold storage and restoring it generates retrieval costs and delays.
A backup job fails silently because of a permissions misconfiguration buried deep in nested IAM roles.

And when something breaks, support isn’t always fast or personal. Unless you’re a top-tier customer, you’re likely working through ticketing systems, documentation loops, or community forums.

These moments don’t just cause frustration—they drain time, inflate costs, and hinder your team’s ability to move fast with confidence.

Complexity isn’t a multi-cloud problem. It’s a design problem.

Let’s revisit the myth: Multi-cloud is too complicated.

It’s an understandable concern, but one that’s often based on frustration inside a major cloud provider’s ecosystem. When teams talk about complexity, they’re usually describing the friction that comes from navigating sprawling services, managing brittle configurations, and troubleshooting opaque policies within one provider.

The real issue isn’t how many clouds you use. It’s how much complexity one provider can introduce when you try to adapt, integrate, or scale. Vendor-specific tooling, tightly coupled services, and unpredictable costs create the illusion of simplicity—until you need to do something the platform didn’t anticipate.

That’s not a multi-cloud problem. That’s a design problem.

Making multi-cloud work for you

For many teams, multi-cloud isn’t a grand strategy, but something that happens organically. AI workloads move to GPU providers. Content gets delivered through specialized CDNs. Backups shift to more cost-effective and geographically separated storage. Whether by design or necessity, most modern architectures already span multiple clouds.

So the smarter question isn’t “Should we avoid it?” It’s “How do we make it sustainable without adding unnecessary complexity?”

That’s where Backblaze B2 comes in.

Backblaze B2 is purpose-built to make multi-cloud not only possible, but practical—for DevOps, SREs, and developers alike. It’s focused, interoperable, and refreshingly straightforward:

Always-hot storage: No tier juggling. No lifecycle scripts. Just fast, consistent access.
S3-compatible APIs: Seamlessly integrates with the tools and platforms you already use, such as Terraform, Kubernetes, ArgoCD, boto3, and more.

Streamlined IAM and UI: Control access and monitor usage without wading through layers of enterprise-grade configuration.

Free egress: Move data where you need it and when you need to, without the surprise charges that make multi-cloud cost-prohibitive.

Many teams start small with offloading archives, mirroring backup buckets, or feeding GPU pipelines for AI training. As modular architectures grow, Backblaze B2 scales with them, but without the rigidity or lock-in.

In fact, a 2025 Enterprise Strategy Group study found that many operational tasks, such as storage deployments, storage management, and integration with the hardware and software that they already used took up to 92% less time.

The simple interface contrasted sharply with other Cloud Service Providers’ interfaces that have confusing navigation and multiple options to sort through.

—Enterprise Strategy Group, “Analyzing the Economic Benefits of the Backblaze B2 Cloud Storage Platform,” May 2025.

Multi-cloud doesn’t have to be messy. With the right storage layer, it becomes your cleanest, most strategic advantage.

Want to dig even deeper?

Download the full ebook New Cloud-Native Times Call for New Cloud Storage Approaches to explore how modular, interoperable strategies are changing the cloud-native game.

The post Cloud Storage Myths Debunked, Part Four: Managing Multiple Clouds Is Too Complicated appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

The Gold Standard of Cloud Security: Why Our SOC2 Type 2 Compliance Sets Backblaze Apart

2025-08-27 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/the-gold-standard-of-cloud-security-why-our-soc2-type-2-compliance-sets-backblaze-apart/

A decorative image showing a server, a drive, a NAS device, and a cloud.

As more organizations rely on the cloud to store critical data, the stakes around compliance and security keep rising. Regulations like GDPR and HIPAA are putting pressure on businesses to demonstrate that their data handling practices are sound, and customers increasingly want evidence—not just assurances—that their data is protected.

Every cloud provider claims to be “secure.” But as a risk owner and decision-maker, you need more than a marketing tagline. You need proof. That’s where SOC 2 Type 2 compliance comes in. At Backblaze, we don’t just meet this benchmark—we go beyond it. Unlike many cloud storage providers (CSPs) that may have only SOC 2 compliant data centers, Backblaze has also undergone the rigorous SOC 2 assessment at the company level.

What is SOC 2, and why does it matter?

SOC 2 (aka System and Organization Controls) is an assessment created by the American Institute of Certified Public Accountants (AICPA). It evaluates how service providers operate based on Trust Services Criteria:

Security
Availability
Confidentiality
Privacy
Processing integrity

Every SOC 2 assessment includes Security as the foundation, and organizations may also be evaluated against additional criteria that align with their services. Our assessment covers both Security and Availability, demonstrating that our systems are protected against unauthorized access and are resilient, reliable, and consistently accessible when you need them.

At Backblaze, we’ve put the right controls in place to meet these standards, such as:

Strong access management policies.
Redundant infrastructure to protect uptime.
Regular penetration testing and incident response reviews.

The business impact? You can rely on us to keep your data safe and accessible—without adding unnecessary risk to your operations.

Type 1 vs. Type 2: A key distinction

There are two types of SOC 2 examinations:

Type 1 shows that a company has the right controls in place at a specific point in time.
Type 2 goes further by validating that those controls are consistently followed and effective over a defined period.

Backblaze has achieved and consistently maintained SOC 2 Type 2 compliance. That distinction matters—it means you’re not just trusting that we say the right things, but that we do the right things, day in and day out.

What SOC 2 compliance delivers

SOC 2 compliance isn’t just a checkbox exercise. It provides meaningful assurances that directly affect your business:

Risk mitigation: Independent validation that controls work as intended.
Trust and credibility: Confidence that your cloud provider takes security seriously.
Vendor due diligence: Simplifies compliance reviews for your team.
Data integrity & availability: Assurance that your data remains reliable and accessible.

In short, SOC 2 compliance reduces uncertainty—making it easier for you to move forward with cloud adoption and scale with confidence.

SOC 2 data centers vs. SOC 2 as a company

It’s important to distinguish between compliance at the data center level and compliance at the company level.

SOC 2 compliant data centers: These examinations focus on the physical facility—things like access controls, environmental monitoring, and fire suppression. Many CSPs rely on SOC 2 certified facilities.
SOC 2 compliance as a company: This examination covers the provider’s internal operations, including policies, processes, and personnel practices. It examines how the service is built, run, and maintained.

Backblaze offers both. Our data centers are SOC 2 compliant, and our company is also SOC 2 Type 2 compliant.

Think of it like a bank: Secure vaults are critical (data centers), but so are strong internal policies and trained staff (company compliance). And, of course, you want both. That’s what we call defense in depth—end-to-end assurance that reduces risk and builds trust.

Surprisingly, you’ll find that many CSPs have SOC 2 data centers, but do not hold SOC 2 compliance at the company level.

Inside the SOC 2 audit process

SOC 2 evaluations are performed by independent third-party CPA firms, which ensures the results are objective and credible. The process includes:

Scoping: Identifying which systems and processes are reviewed.
Control documentation: Recording policies and procedures.
Evidence collection: Proving that controls are in place.
Testing & evaluation: Verifying effectiveness over time.
Reporting: Delivering findings in a formal report.

At Backblaze, this isn’t a one-and-done exercise. We undergo annual audits, maintain robust monitoring, and test our systems regularly. For example:

Incident response plans, playbooks, and processes are reviewed and updated as needed.
Penetration testing, the public bug bounty program, and our vulnerability management processes are designed to proactively identify, evaluate, prioritize, and remediate potential vulnerabilities.
Change management ensures updates don’t introduce unnecessary risk.

Each step reinforces our commitment to security and transparency—so you don’t have to take our word for it.

Policies that protect your data

Policies and processes are the backbone of an effective SOC 2 program. At Backblaze, these policies aren’t just written down; they’re embedded in how we operate every day.

Change management (Security, Availability)

Changes that impact our systems, infrastructure, or software are controlled, tested, and approved before release. This prevents unauthorized or accidental changes that could disrupt operations or compromise security. For customers, this means you can rely on a stable, reliable storage platform that won’t jeopardize your workflows.

Logging & monitoring (Security, Availability)

We log system activities, monitor access attempts, and alert on high priority security events around the clock. We have implemented features such as Anomaly Alerts to support notifying customers about unusual file upload and download patterns. Bucket Access Logs give you visibility into who accessed your data and when—adding both accountability and an audit trail for incident response.

Media handling & drive destruction (Security)

Physical media like drives are tightly controlled throughout their lifecycle. When a drive reaches end-of-life, it undergoes a secure erasure process. If it is not able to be securely erased, the device is destroyed, ensuring data is completely unrecoverable.

Environmental security (Availability)

Protecting data also means protecting the environment where it lives. Our data centers are equipped with redundant power and cooling systems, fire suppression, and environmental monitoring. Facilities are staffed 24/7/365 to respond to incidents in real time. These measures ensure uptime and business continuity—even in the face of physical disruptions like outages or natural disasters.

Each of these policies maps directly back to Trust Services Criteria, but more importantly, they translate into reduced risk, stronger reliability, and greater peace of mind for your business.

Why Backblaze stands apart

If you’re evaluating cloud storage providers, you can request a copy of our SOC 2 Type 2 report through Whistic. Backblaze currently offers 3 profiles on Whistic: Education Industry profile link, EU Customers profile link, or All Other Customers profile link. Once you have signed up, or signed in, you will be able to view or download the applicable documents and questionnaires.

Backblaze’s combination of SOC 2 compliant data centers and company-wide SOC 2 Type 2 compliance provides a higher level of assurance than many providers offer. That additional assurance is a powerful differentiator, especially for businesses in regulated industries.

And we’re not stopping here. Security isn’t static. We commit to annual assessments, continuous monitoring, and adapting to new threats as they emerge—so you can trust that your data is in good hands today, tomorrow, and beyond.

The post The Gold Standard of Cloud Security: Why Our SOC2 Type 2 Compliance Sets Backblaze Apart appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Cloud Storage Myths Debunked, Part Three: Onboarding Specialized Providers Is Too Hard

2025-08-22 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/cloud-storage-myths-debunked-part-three-onboarding-specialized-providers-is-too-hard/

Here’s a myth we hear again and again:

Integrating a new storage provider is too complicated. Migrating data, retraining teams, and reconfiguring tools will take too long and create too much risk.

It’s understandable. Data migrations have a reputation for being messy and disruptive. And let’s be honest, nobody wants to babysit infrastructure when there are products to build. For many teams, just the thought of switching cloud providers feels like a detour they don’t have time for.

But in reality, that fear is often bigger than the actual lift. If your workflows already use standard tooling like S3-compatible APIs, switching to a specialized provider is more like a well-marked exit than a hard left turn.

This is the third post in our series debunking persistent myths about cloud storage—check out the first and second posts—and why a best-of-breed, interoperable approach is actually less disruptive than sticking with legacy hyperscaler models.

New Cloud Native Times Call for New Cloud Storage Approaches

Learn more about how the open cloud supports faster development, improved workflows, and reduced cost complexity in our free ebook, “New Cloud Native Times Call for New Cloud Storage Approaches.”

Migration anxiety vs. reality

“Storage migration” can sound like it requires weeks of planning and an army of engineers. But if your apps are already using S3-compatible workflows, most of the heavy lifting is already done.

If you know S3, you’re already ready

Many specialized storage providers now support S3-compatible APIs, allowing teams to keep the tools, scripts, and services they already know, such as Terraform, Kubernetes, ArgoCD, boto3, and MinIO.

And because your teams are already familiar with the S3 API and related tools, retraining isn’t a hurdle. The same skills, scripts, and automation frameworks carry forward, keeping onboarding time minimal. In fact, most teams are surprised by how little they need to change to get started.

That means:

No need to learn a new SDK or storage interface
No retraining your DevOps team
No rewriting automation pipelines or batch jobs

In most cases, all it takes is updating your endpoint URL and refreshing credentials. The mental model stays the same, the tools stay the same, and your workflows continue as-is.

You don’t need to rip and replace

Downtime concerns are one of the biggest sources of hesitation when switching providers. But in practice, migrations to S3-compatible cloud storage providers rarely require full cutovers or risky, all-or-nothing switchovers. With a bit of planning, most teams handle migrations incrementally:

Start by migrating lower-risk datasets, such as backups or archives.
Validate configurations and permissions as data lands in the new system.
Slowly expand to production datasets as confidence grows.

Better yet, you don’t have to move everything at once. Many teams adopt phased transitions, running some buckets side-by-side or writing to both systems during the handoff to minimize risk. With a bit of planning and the right migration tools and guidance, you can keep operations stable while gradually shifting workloads at a comfortable pace.

Metadata isn’t a blocker

Migrating files without metadata continuity can break downstream systems, especially if your applications rely on timestamps or version tracking.

Fear not. S3-compatible cloud storage providers can preserve metadata during migration, including timestamps. That means your historical data stays intact and compliant with internal policies or regulatory needs, and you won’t need to reset or alter your data management policies.

Moving isn’t the risk. Staying locked in is.

Let’s flip the narrative. The real risk isn’t switching; it’s staying stuck.

Major cloud provider ecosystems are designed for lock-in. The deeper you go, the harder it becomes to leave. Features that look like conveniences, such as integrated IAM policies, tiered storage, and custom APIs, often become entanglements over time.

Each of these layers is built to reinforce reliance:

IAM rules tie access tightly to the provider’s own tooling.
Tiered storage creates dependencies on lifecycle rules and retrieval thresholds.
Custom APIs mean even basic storage functions can require provider-specific logic.

And as you expand your usage—adding compute, networking, and security services—everything becomes interdependent. What starts as convenience evolves into constraint. Even small changes to your stack can trigger cascading reviews, system audits, or full reconfigurations.

The result? Innovation slows. Costs creep up. Flexibility disappears.

With a specialized provider, you break that cycle.

Specialized Doesn’t Mean Complicated

Specialized storage doesn’t complicate onboarding. It streamlines it. Solutions like Backblaze B2 are purpose-built to make this shift smooth and sustainable, without the trade-offs or surprises you might expect from switching providers.

S3 compatibility allows for seamless integration with the tools and workflows your team already uses.

Granular control means you can choose the tools and providers that work best for your architecture, not the ones bundled into a vendor’s ecosystem.

Metadata continuity is supported through features like custom upload timestamps, preserving file context during migration.

Transparent pricing ensures there are no hidden egress fees, transaction charges, or retention penalties to catch you off guard.
Hands-on support helps you plan, validate, and scale your migration with confidence and minimal disruption.

Breaking out of a single-vendor ecosystem may feel intimidating, but it’s often the fastest way to simplify operations, improve performance, and regain control over your cloud strategy.

The best part? Once you’ve made the move, you’re free to experiment. Multi-cloud strategies become more accessible. Your architecture becomes more modular. And your team can focus on building, not babysitting infrastructure.

Next Up: In the final post in this series, we’ll tackle Myth #4: Managing multiple clouds is complicated. (Spoiler: It doesn’t have to be.)

Want to dig even deeper? Download the full whitepaper New Cloud-Native Times Call for New Cloud Storage Approaches.

The post Cloud Storage Myths Debunked, Part Three: Onboarding Specialized Providers Is Too Hard appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Cloud Storage Myths Debunked, Part Two: Storage Isn’t a Big Enough Problem to Remediate

2025-08-19 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/cloud-storage-myths-debunked-part-two-storage-isnt-a-big-enough-problem-to-remediate/

A decorative image showing devices on a cloud background.

Today’s myth might sound familiar:

Storage is a minor cost, so it’s not worth switching from major cloud providers.

It’s easy to see how this thinking takes hold. In many cloud-native projects, storage can be the last concern. Compute, networking, and database services often drive most of the costs. Storage? That’s just where your data sits.

But hidden fees, unpredictable retrieval charges, and surprising performance constraints make storage far more impactful than many teams realize—especially for cloud-native workflows.

This post is the second in our blog series unpacking four of the most common myths and misconceptions about specialized cloud storage—see the first post here—and why an interoperable, best-of-breed approach can enhance and streamline cloud-native app development.

New Cloud Native Times Call for New Cloud Storage Approaches

Learn more about how the open cloud supports faster development, improved workflows, and reduced cost complexity in our free ebook, “New Cloud Native Times Call for New Cloud Storage Approaches.”

Underestimate the importance of storage at your peril

Major cloud providers offer plenty of storage options, but the real costs aren’t always clear up front.

The charges you don’t see coming

Between egress charges, API call fees, transaction costs, and minimum retention charges, even small changes in how your application moves or retrieves data can quickly inflate your bill.

What looked affordable at launch can snowball into sticker shock once traffic increases or new workloads start driving more data in and out of storage. A single spike in user activity or analytics queries can trigger thousands (or millions) of storage transactions, each billed individually.

These expenses add up fast, and they’re tough to predict.

The egress trap

Egress charges might be one of the best-kept open secrets among the “big three” cloud providers. Every time data leaves their environment—whether to a CDN, another cloud, or end users—egress fees kick in. And they aren’t trivial.

Frequent data transfers, a hallmark of cloud-native architectures, can quietly devour budgets. The big dogs know this. Once your data is deep in their ecosystem, pulling it out becomes financially painful.

This creates a subtle but powerful form of vendor lock-in, making it harder to shift workloads or storage to more specialized providers.

Vendor lock-in, by design

Major cloud providers bundle storage, compute, networking, and a long list of services into tightly coupled ecosystems. On paper, that integration offers convenience. In practice, it creates real friction if you ever want to move.

Even when using open standards like the S3 API, migrating workloads can require new tooling, careful planning, and extensive testing. Under tight deadlines, the mere prospect of switching providers can feel too risky to attempt.

It’s not just inertia; it’s engineered friction designed to keep you tethered.

Complex storage slows down everything

Bundling storage inside a big cloud provider’s stack might seem efficient, but it often creates fragile setups that slow teams down. Configurations get complicated fast:

Hot and cold storage tiers
Lifecycle rules
IAM policies
Interdependent compute pipelines

Every added layer increases the odds that something breaks, pulling engineers into troubleshooting instead of building.

Latency-sensitive workloads such as real-time analytics or streaming services are especially vulnerable. Even small missteps can ripple through the user experience.

And when those issues hit, teams scramble to patch things up, burning time and resources that could be better spent moving products forward.

AI workloads bring storage costs into sharp focus

AI-powered applications, from model training to updating retrieval-augmented generation (RAG) pipelines, put heavy demands on storage. These workloads hammer systems with high-throughput reads and writes.

Each refresh or update adds to your bill:

Delete penalties
Retention minimums
API call surcharges

When teams start rationing runs, batching updates, or delaying refreshes just to control costs, innovation slows.

Specialized storage keeps costs predictable and workloads agile

Unlike the “big three” cloud providers, who often hide complexity behind convenience, specialized cloud storage providers like Backblaze B2 take a more transparent approach:

Clear, predictable pricing means no surprise egress fees, retrieval costs, API charges, or deletion penalties.
Always-hot storage eliminates the need for lifecycle policies and tier management.
Open architecture means you stay in control—no proprietary hooks, no walled gardens, and no painful unwinding if your needs change down the road.

For cloud-native teams, this isn’t just a storage swap; it’s an operational upgrade. Streamlined management, lower risk, and transparent costs mean teams can focus on shipping new products and features, not decoding their storage bills.

In fact, Enterprise Strategy Group released a comprehensive analysis of the economic benefits of Backblaze B2 in May 2025. The analysis concluded that Backblaze B2’s storage costs (monthly storage cost + cost of downloads + cost of transactions) were 3.1x to 3.2x lower than alternative cloud storage providers.

Simple, transparent, affordable pricing enabled Backblaze B2 users to spend far less on storage and use the savings to innovate and grow.
—Enterprise Strategy Group

What’s next for you and storage?

If you liked this article, check out the first in the series, “Cloud Storage Myths Debunked: Hyperscaler Storage Is Good Enough for Cloud-Native Apps.” And, stay tuned for the next post in this series, where we’ll tackle myth #3, addressing whether onboarding specialized providers is too hard. (It’s easier than you think.)

Want to dig even deeper? Download the full ebook “New Cloud-Native Times Call for New Cloud Storage Approaches.”

The post Cloud Storage Myths Debunked, Part Two: Storage Isn’t a Big Enough Problem to Remediate appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

The Compliance Arms Race: What GovRAMP Means for SLED, Cloud Vendors, and the Rest of Us

2025-08-13 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/the-compliance-arms-race-what-govramp-means-for-sled-cloud-vendors-and-the-rest-of-us/

A decorative image showing a server, a NAS, and a computer.

If you’ve spent any time sourcing, evaluating, or speculating about cloud services in the public sector lately, you’ve likely felt it: the arms race happening in compliance. Courting customers from schools to statehouses to national labs, more and more cloud vendors are racing to pin the next security badge to their lapel—GovRAMP (formerly known as StateRAMP), TX-RAMP, FedRAMP, SOC 2, and on and on.

And while it might feel like a compliance bingo card, there’s real strategy and real consequences behind this sprint. At the heart of it all is the SLED market (state and local government, and education)—a sprawling patchwork of institutions tasked with safeguarding citizen data and taxpayer trust, all while operating with limited resources and infrastructure budgets.

Let’s talk about why this compliance arms race exists, what it means for buyers and vendors alike, and how we at Backblaze are choosing to compete not just with checkboxes, but with character.

Why does SLED even need unified standards?

Public sector IT has long been a security quilt. Some agencies stitched up with advanced defenses, others more… threadbare. While some may have advanced security tooling, a K–12 school district might still be running on legacy systems and duct tape. Yet both manage data that’s increasingly digital, distributed, and vulnerable.

The result? Inconsistent practices and rising risks. Enter: GovRAMP.

What is GovRAMP?

Short for Government Risk and Authorization Management Program, GovRAMP was customized to standardize cloud security for state and local agencies. It’s actually based on the same set of controls for FedRAMP—controls derived from the National Institute of Standards and Technology (NIST) SP 800-53, a catalog of controls for organizations to manage cybersecurity and privacy risk. GovRAMP ensures that even the smallest public institutions can procure secure IT solutions without reinventing the wheel every time.

GovRAMP was originally launched as StateRAMP, but has since grown beyond state lines, evolving into a broader framework adopted by local governments and school systems. Today, it’s a rigorous, independent audit program that holds vendors to a high set of security controls. Translation: If a vendor is GovRAMP-authorized, they’re playing in the big leagues of cloud security.

The alphabet soup of compliance: TX-RAMP, GovRAMP, FedRAMP

If you’re in Texas, you’re probably familiar with TX-RAMP, the state’s specific compliance framework. The good news? GovRAMP and TX-RAMP are closely aligned. At Backblaze, our GovRAMP Progressing Snapshot status qualifies us for TX-RAMP Provisional Authorization as well—one less hurdle for Texas agencies seeking secure, scalable cloud storage.

As for FedRAMP, it remains the gold standard for federal data, but for the vast majority of public sector orgs, including most SLED agencies, it’s simply unnecessary.

How GovRAMP streamlines cloud sourcing

Here’s where the compliance arms race actually makes things easier: Once a vendor is authorized through GovRAMP, SLED buyers can trust that the solution meets certain security standards, saving months of one-off vetting, paperwork, and duplicated audits. In a procurement environment plagued by inefficiency, that’s no small thing.

Especially now, as budgets tighten and AI-driven everything drives demand for flexible infrastructure, reducing sourcing friction matters more than ever.

Going beyond checklists: What buyers should really look for

Checkboxes alone don’t guarantee real-world resilience. Compliance can become its own form of security theater. It looks good on paper but falls short in practice. That’s why buyers should dig deeper.

Look for vendors who not only pass audits but live and breathe their controls. That means going beyond annual assessments and embracing security as a continuous, integrated discipline. The best partners are transparent, proactive, and thoughtful about risk—not just checking boxes, but building real-world resilience. Here are a few signs to look for:

Continuous monitoring and internal audits: They treat compliance as an ongoing process, not a once-a-year scramble.
Clear, accessible documentation: Security policies, certifications, and standardized independent attestations are available (under NDA if needed), not locked in a black box.
Transparent data practices: They’re upfront about where your data lives, who can access it, and what happens in the event of an incident.
Responsive support: You can communicate with real people who understand your risk profile—not just surface-level answers or automated replies.
Affordable recoveries: They don’t make recovering your data prohibitively expensive. Look at their egress policies and price out what it would actually cost to retrieve your data.

When you’re responsible for protecting sensitive data, it’s not enough to be compliant. You need a partner who’s disciplined, trustworthy, and invested in your resilience.

The Backblaze approach: Rigor, transparency, and trust

Pursuing authorizations like GovRAMP and TX-RAMP isn’t easy, but it’s the right thing to do and we’re committed to the process. We believe public sector buyers deserve cloud partners who understand their constraints, meet them where they are, and still bring best-in-class solutions to the table.

But more than that, we’re not stopping at frameworks. Compliance is a floor, not a ceiling. We’ve built our platform on decades of operational rigor and security discipline—not to impress auditors, but to earn your trust. And we’ve structured our products to enable security best practices, not hinder them, including 3x free egress for disaster recovery.

So yes, we’re proudly in the compliance race. But we’re not just chasing badges. We’re building something secure, sustainable, and ready for whatever comes next.

Want to learn more about our GovRAMP journey or how Backblaze supports public sector cloud transformation? Reach out to our Sales team.

The post The Compliance Arms Race: What GovRAMP Means for SLED, Cloud Vendors, and the Rest of Us appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing AWS Cloud Control API MCP Server: Natural Language Infrastructure Management on AWS

2025-08-13 Kevon Mayers

Post Syndicated from Kevon Mayers original https://aws.amazon.com/blogs/devops/introducing-aws-cloud-control-api-mcp-server-natural-language-infrastructure-management-on-aws/

Today, we’re officially announcing the AWS Cloud Control API (CCAPI) MCP Server. This MCP server transforms AWS infrastructure management by allowing developers to create, read, update, delete, and list resources using natural language. As part of the awslabs/mcp project, this new and innovative tool serves as a bridge between natural language commands and AWS infrastructure deployment and management. This MCP server is powered by the AWS Cloud Control API – a standardized API that allows CRUDL (Create/Read/Update/Delete/List) operations to be performed against AWS and third party resources using a single endpoint.

Key Features:

Leverages AWS Cloud Control API for CRUDL operations for more than 1,200 AWS resources
Enables LLM-powered agents and developers to manage infrastructure with natural language prompts
Provides the option to output Infrastructure as Code (IaC) templates for infrastructure it will create, allowing to still be used with existing CI/CD pipelines
Integrates with AWS Pricing API to provide cost estimates for the infrastructure it will create
Applies security best practices automatically using Checkov

Why Use CCAPI MCP Server?

Simplified Infrastructure Management: No more wrestling with complex templates or documentation
Increased Developer Productivity: Focus on what you need, not how to configure it
Reduced Learning Curve: Onboard new team members faster with natural language commands
LLM Integration: Perfect companion for AI-assisted development workflows

The CCAPI MCP Server transforms infrastructure management by enabling natural language interactions for AWS resource operations. Bridging natural language commands with AWS infrastructure deployment and management, this MCP Server allows developers to manage cloud infrastructure through conversational inputs such as:

Can you create a new s3 bucket for me?or
Find all of my EC2 instances and tell me which one have an instance type that is not t2.large

This significantly reduces configuration overhead and accelerates onboarding for new team members, directly translates developer intent into cloud infrastructure.

Let’s see it in action.

Creating and Managing Cloud Infrastructure

Prerequisites

uv package manager installed
Python 3.x.x installed
AWS credentials with appropriate permissions. The MCP server supports multiple ways to define these credentials. See the MCP documentation for more information. Using dynamic credentials such as one provided via SSO is recommended. For more information on configuring AWS credentials, see the AWS CLI documentation.
An MCP Host application installed that supports MCP Clients and MCP Servers (e.g. Amazon Q Developer, Claude Desktop, Cursor, etc.). To follow this blog install Amazon Q Developer for CLI (CLI) as described in the installation instructions

Integration with Developer Tools

To start using the CCAPI MCP server, you will need to set up your server configuration which is typically in a file named mcp.json. For this blog we will focus on using the CCAPI MCP server with Amazon Q Developer. Note that for other MCP Host applications the path to the mcp configuration file may differ. You will need to create the file if it does not already exist in the directory.

1. Global Configuration: ~/.aws/amazon/mcp.json – Applies to all workspaces

2. Workspace Configuration: .amazonq/mcp.json – Specific to the current workspace

More information can be found in the Amazon Q Developer User Guide.

Configuration file structure

The MCP configuration file uses a JSON format with the following structure:

mcp.json

{
  "mcpServers": {
    "server-name": {
      "command": "command-to-run",
      "args": ["arg1", "arg1",],
      "env": {
        "ENV_VAR1": "value1",
        "ENV_VAR2": "value2",
      },
    }
  }
}

Here is mcp.json with the CCAPI MCP Server configuration:

{
  "mcpServers": {
   "awslabs.ccapi-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.ccapi-mcp-server@latest"
      ],
      "env": {
        "AWS_PROFILE": "your named AWS profile",
	"DEFAULT_TAGS": “enabled”,
	"SECURITY_SCANNING": “enabled”,
	"FASTMCP_LOG_LEVEL": “ERROR”
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

Important

Ensure you correctly set your AWS credentials in the MCP server config. It is essential that you properly configure these credentials, as the MCP server uses their associated permissions when invoking the AWS Cloud Control API for CRUDL operations in your AWS account. The server supports multiple methods of consuming these credentials such as AWS profiles, Environment Variables, SSO tokens, etc. You can see some of this in the aws_client.py file. See these docs on using named profiles for more information.

Read Only Mode

If you would like to prevent the MCP server from performing mutating actions (e.g. Create/Update/Delete Resource), you can specify the --readonly flag as demonstrated below:

{
  "mcpServers": {
   "awslabs.ccapi-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.ccapi-mcp-server@latest",
        “--readonly”"
      ],
      "env": {
        "AWS_PROFILE": "your named AWS profile",
	"DEFAULT_TAGS": “enabled”,
	"SECURITY_SCANNING": “enabled”,
	"FASTMCP_LOG_LEVEL": “ERROR”
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

More information on the configuration and tools the CCAPI MCP server provides can be found in the AWS CloudFormation MCP Server documentation.

Security Considerations

Ensure the IAM credentials include permissions for Cloud Control API actions (List, Get, Create, Update, Delete). See the AWS CCAPI API documentation for more info
Follow IAM least privilege principles
Enable AWS CloudTrail auditing
Consider running in read-only mode with --readonly flag for safer operations

Example Use Case: Creating an S3 Bucket with KMS Encryption

IMPORTANT: Ensure you have satisfied all prerequisites before attempting these commands.

1. With the mcp.json file correctly set, try to run a sample prompt. In your terminal, run q chat to start using Amazon Q in the CLI.

Q CLI Initial Load of Cloud Control API MCP Server 2. This will start initializing the MCP servers in the background, allowing you to immediately start using Q Chat even if they are still loading. As a note, if these have not finished loading, your prompts will be handled without using any MCP servers. To check the status of the servers, run /mcp

3. Once that you have validated that the MCP server was loaded successfully, try a sample command. Simply tell Amazon Q : Create an S3 bucket with versioning and encrypt it using a new KMS key

Amazon Q will use the server to automatically:

Fetch your current environment variables
Use those to fetch your current AWS session info
Create code that defines what is in your prompt
Explain the code that was generated
Run security analysis against the code that was generated (if enabled)
Explain the results of the security analysis
Validate the configuration against AWS Cloud Control API schemas (which use CloudFormation Resource Provider Schemas as their foundation) and IAM policies. This validation ensures compliance with Cloud Control API requirements, which is essential for resource creation
Create the resources directly through Cloud Control API

Note: While CloudFormation schemas are referenced in the validation step, this solution uses Cloud Control API for resource management, not CloudFormation. The schemas are used because they define the standardized resource properties that Cloud Control API expects.

4. First, Amazon Q will mention that it needs to check the environment variables to find information related to the AWS session information. It will inform you about the specific tool it aims to use and will ask for permission. Select y to accept and allow actions.

5. Next, Amazon Q will ask to use get_aws_session_info() to fetch information about the AWS session it should use for subsequent actions. It will use the relevant values from the environment variables defined in the MCP configuration file (e.g. ~/.aws/amazon/mcp.json)

6.Amazon Q will then display the AWS account ID and region it will use to deploy resources. To start, it will use generate_infrastructure_code() to generate the resource properties for a KMS key that will be sent to Cloud Control API. These properties mirror the structure defined in AWS CloudFormation Resource Provider Schemas (which Cloud Control API uses as its foundation), allowing for security validation through Checkov before deployment. The key will be configured following security best practices, with a key policy scoped to only allow usage within the AWS account.

7. Once that Amazon Q has generated the code for the resource, it will run then use the explain() tool to explain the infrastructure code that was generated. Note that default tags MANAGED_BY, MCP_SERVER_SOURCE_CODE, and MCP_SERVER_VERSION are added for all resources managed by the CCAPI MCP server. These tags provide for ease of identification of infrastructure that is being managed by the MCP server. They are configurable and you optionally can disable them, but we highly recommend adding tags to ensure you have visibility into infrastructure that is being managed by the CCAPI MCP server.

8. It will then attempt to use the run_checkov() tool to inspect the security of the code. This tool is triggered because SECURITY_SCANNING was set to enabled in your server configuration file.

9. After Checkov has run, it will then attempt to use the explain() tool again to explain the security findings from the Checkov run. If there were no security issues, it will attempt to proceed. If there were security issues, you will be asked how you’d like to proceed, and Amazon Q will recommend necessary fixes. By default, the checks that passed will only give a minimal summary. If you’d like to get more information, just ask for more details.

10. The next tool that Amazon Q will use is the create_resource() tool. This tool will attempt to create the resource using the AWS Cloud Control API, and then use the get_resource_request_status() tool to check the status of the creation. This tool uses the request token to identify the request that was submitted to the Cloud Control API and uses this to fetch its status information.

11. Amazon Q will continue using the CCAPI MCP server tools as needed until it finishes creation of both the S3 Bucket and KMS Key and will output a summary.

12. Now, ask Amazon Q to make a change potentially negatively affecting security, for example by allowing the S3 bucket to be publicly accessible. While this configuration is generally advised against, sometimes it is necessary – such as when you want to use the S3 bucket for public website hosting. Amazon Q will respond letting you know that what you are asking for is not the best practice, and explain why. However, since this could be a valid request depending on your use case, it will prompt you to confirm.

13. The CCAPI MCP server also has integrations with the AWS Pricing API, so you can even ask for the estimated cost of what it has deployed.

14. Lastly, ask Amazon Q to create a CloudFormation template of what it has created so far so you can either have a backup, or if you want to redeploy something similar, you will have a template to work off. It will use the create_template() tool to accomplish this task.

Note: The create_template() tool comes with predefined settings:

Outputs YAML format by default (can be JSON)
Sets DeletionPolicy to RETAIN
Sets UpdateReplacePolicy to RETAIN
Allows optional parameters for template ID, file saving location, and region specification

For more information, review the tool in the source code.

15. Try one more dangerous operation, attempting to delete all resources within an AWS account. The security checks block this attempt and suggest other alternatives.

16. Finally, ask Amazon Q to just delete what it has created. This time it will use the get_resource() tool to get information about the existing resources it created, the explain() tool to explain the changes that will be made, and finally the delete_resource() tool to delete the resources.

After successfully deleting the resources, it will provide a final summary.

Sample Prompts for Easy Start

Sample Prompt	What It Does
“Create a VPC with private and public subnets”	Sets up a complete network environment
“List all my EC2 instances”	Shows running instances across your account
“Create a serverless API for my application”	Deploys API Gateway with Lambda integration
“Set up a load-balanced web application”	Creates ALB with target groups and instances

Conclusion

The AWS Cloud Control API MCP Server represents a significant advancement in AWS infrastructure management, making operations on cloud resources easy to express and access through natural language. Whether you’re streamlining operations, experimenting with LLM-based development, or onboarding new team members, whether you are using Amazon Q Developer in CLI or any other MCP Host application (such as Claude Desktop or Cursor), the CCAPI MCP servet and its tools offer a truly intuitive way to interact with AWS.

Authors

Flexibility to Framework: Building MCP Servers with Controlled Tool Orchestration

2025-08-13 Kevon Mayers

Post Syndicated from Kevon Mayers original https://aws.amazon.com/blogs/devops/flexibility-to-framework-building-mcp-servers-with-controlled-tool-orchestration/

MCP (Model Control Protocol) is a protocol designed to standardize interactions with Generative AI models, making it easier to build and manage AI applications. It provides a consistent way to communicate context with different types of models, regardless of where they’re hosted or how they’re implemented. The protocol helps bridge the gap between model deployment and application development by providing a unified interface for model interactions. While this protocol provides flexibility in tool choice, there are key challenges when the order of tool usage needs to be enforced. In this blog post, you will learn about how I designed this functionality and implemented it into the AWS Cloud Control API (CCAPI) MCP server .

The Challenge – Enforcing Tool Ordering in MCP

When you think of MCP, you likely think of choice. Arguably one of the main reasons you may want to use an MCP server, is to allow a Large Language Model (LLM) (through agents) to access a set of tools such as reading from a database, sending an email, or in something along those lines. The MCP framework doesn’t provide a native mechanism to enforce the sequence in which tools must be called.

Let’s take as an example two tools – fetch_weather_data() and send_email(). For the LLM using your MCP server, it is reasonable to think that you may want to enforce that an email that is sent has the current weather included. Or for another example, tools getOrderId() and getOrderDetail(), where the OrderId would be required to subsequently fetch the OrderDetail. Since MCP currently lacks tool ordering preferences, these types of sequential dependencies can be challenging to enforce.

MCP tools are designed to be independent functions that an LLM can invoke as needed. There’s no built-in concept of “workflow” or “sequence” in the MCP framework itself. Each tool call is treated as a separate operation, with no inherent knowledge of what came before or what should come after. This means that by default, an LLM can technically call your tools in any order it chooses, regardless of the logical workflow you intend.

While LLMs excel at flexible decision-making, some scenarios like infrastructure management require strict operational ordering. This presents a unique challenge when building MCP servers: how do you maintain the LLM’s natural flexibility while enforcing critical sequential dependencies?

When you think of Infrastructure as Code (IaC), you think of repeatability, consistency, versioning, and continuous integration/continuous deployment (CI/CD). Within CI/CD you have a set flow:

Pull request is generated
CI/CD pipeline is triggered
Series of steps runs to run linting, security tests, unit tests, end-to-end tests, etc.
A failure in any stage should stop the entire pipeline run

This posed a challenge with IaC and LLMs. Generative AI is non-deterministic, meaning the same prompt may not always generate the same exact response. If the result deviates significantly from what it should be, it is considered a hallucination. So, what can be done to guide the LLM on what you want it to do? Let’s talk about how this was addressed in the CCAPI MCP server.

Understanding MCP Tool Discovery and Initialization

Before diving into the solution, it’s important to understand how MCP servers communicate with AI Agents. During initialization, the MCP protocol follows specific lifecycle phases where capabilities and tools are discovered.

The Model Context Protocol defines a structured lifecycle for client-server connections that ensures proper capability negotiation and state management.

MCP Lifecycle

These phases include:

Initialization: Capability negotiation and protocol version agreement
Operation: Normal protocol communication
Shutdown: Graceful termination of the connection

The initialization phase establishes protocol compatibility and shares implementation details. This is when an AI Agent learns about available tools through schema definitions and receives instructions for tool usage. This initialization process is crucial to the solution, as it’s where AI Agents first discover what tools are available and how they should be used. During this phase, the client sends information about its protocol version, capabilities, and implementation details. This is how tools like Amazon Q CLI receive information about an MCP server’s version, available tools, and usage instructions.

Note: For more information on the MCP lifecycle, see these docs.

Solution – Token-Based Tool Orchestration: A New Pattern for AI Agents in MCP

MCP Token Orchestration

MCP presents a specific challenge: tools cannot directly communicate with each other to enforce execution order. The CCAPI MCP server addresses this through a token messenger pattern shown above, where the server generates and controls validation tokens, and the AI Agent (as the MCP client) passes these tokens between tool calls.

Core Implementation:

Function Enhancement – The mcp.tool() decorator transforms each function into a more capable entity. It wraps the function with a schema that defines required inputs and their validation rules, while preserving detailed documentation through docstrings. Each enhanced function clearly communicates its requirements and provides explicit error messages when dependencies aren’t met.
Dependency Discovery – During the initialize phase in the MCP lifecycle, the AI Agent (as the MCP client) receives a complete map of all defined tools and their schemas from the MCP server. The LLM, which is part of the AI Agent, uses these schemas to understand dependencies through both parameter descriptions and required input arguments. For instance, when a tool requires a parameter described as “Result from get_aws_session_info()” and defines security_scan_token as a required input argument, the LLM understands it needs both valid tokens before proceeding. This combination of descriptive text and explicit input requirements enables the AI Agent to execute sequences like get_aws_session_info() → generate_infrastructure_code() → run_checkov() → create_resource().
Token Validation Control –The server generates and controls all workflow tokens through a unified server-side storage system (_workflow_store). Each tool in the workflow generates cryptographically secure tokens, and these tokens are stored server-side with their associated data.

The AI Agent maintains these tokens in its conversation context throughout the workflow, passing them between tool calls. For security, each token used by the AI Agent must be validated against the server’s token storage. Since these tokens are short-lived, they are stored in memory (RAM) and are actively managed by the MCP server, which deletes tokens after use to maintain freshness. Any remaining tokens are automatically cleared when the server process ends or restarts. If a token doesn’t exist in the server’s storage (either because it’s invalid or already consumed), the operation fails immediately with an error. This validation is uniform across all token types, ensuring the AI Agent cannot create or modify tokens.

As the workflow progresses, tools consume existing tokens and generate new ones. For example, when explain() receives a properties_token, it first validates it exists and matches what is in _workflow_store, then consumes it and generates a new explained_properties_token. This creates a cryptographically secure chain of operations that enforces the workflow sequence (generate → scan → create), with server-side validation at every step.

The result is a predictable workflow system with strong security controls – tokens must be generated by the server and validated against server-side storage at each step, helping ensure the integrity of the infrastructure management process. This approach provides robust workflow enforcement within the confines of the current functionality of the FastMCP framework. While explicit schema-defined dependencies like @mcp.tool(depends_on=["run_checkov"]) as mentioned in this GitHub Issue would be ideal and could hopefully be added in future FastMCP versions, the current token-based approach with descriptive parameter names and clear validation provides reliable tool ordering that LLMs consistently follow without confusion.

Potential Limitations and Solutions

Session Management – When an AI Agent’s session ends or refreshes, any in-progress workflows must be restarted. This is by design – tokens are meant to be short-lived and tied to specific workflow sequences. AWS credentials naturally expire within hours as part of standard security practices, providing a natural boundary for workflow sessions.
Concurrent Workflows – Each AI Agent interaction operates independently, which is appropriate for maintaining security boundaries between different workflow instances. While this means each session starts fresh, it ensures clean separation between different infrastructure operations.
Implementation Options – For organizations requiring workflow persistence, traditional database storage could maintain session state between restarts. However, since tokens are designed to be short-lived security controls, most implementations can rely on the default in-memory storage with natural session boundaries.

The token messenger pattern provides a solid foundation for secure workflow orchestration, with its intentionally ephemeral tokens ensuring proper tool sequencing and data integrity during infrastructure operations.

The Future of MCP

While the above solution works, this process made me think about the future of MCP and how it can and should continue to grow. There are many updates to the framework I’ve seen recently, and it’s great to see activity. For Agentic AI in general, there are strong signs that the future of agentic platforms may be more deterministic in nature, as highlighted by Claude Code’s new support for lifecycle hooks. Per their docs, “Hooks provide deterministic control over Claude Code’s behavior, ensuring certain actions always happen rather than relying on the LLM to choose to run them.” For IaC and other deterministic technologies that it is desired to integrate AI with, this is essential for wide-scale adoption.

Conclusion

The journey of Model Control Protocol (MCP) and this new frontier of leveraging AI for managing cloud infrastructure continues to evolve, presenting both opportunities and challenges in the world of cloud computing and artificial intelligence. Current approaches using prompt loading and parameter dependencies have helped address initial challenges around tool ordering and security protocols, demonstrating how MCP can be effectively used in enterprise applications.

While the current implementation using workflow tokens and validation checks provides a functional solution, we continue to explore ways to enhance the protocol’s capabilities. For those interested in contributing to MCP’s evolution, you can find our proposals for protocol improvements, including enhanced dependency management, in the modelcontextprotocol GitHub org as well as in the FastMCP GitHub repository.

If you’d like to learn more about the AWS Cloud Control API MCP server mentioned in this blog, check out the documentation and GitHub repo. If you’d like to get hands on with it and other AWS MCP servers, check out this AWS workshop. Happy vibe coding my friends.

Authors

The Essential Guide to Disaster Recovery: Building Resilience for Your Enterprise

2025-08-07 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/the-essential-guide-to-disaster-recovery-building-resilience-for-your-enterprise/

A decorative image showing a computer with various files and a warning sign.

Disaster recovery (DR) is a top-line priority for enterprise organizations facing increasingly complex threats—sophisticated ransomware attacks, widespread cloud outages, and regulatory risks. The ability to recover quickly and maintain business continuity isn’t just a technical necessity—it’s a competitive imperative.

Today, I’m breaking down foundational strategies for enterprise DR readiness. You’ll find practical guidance on infrastructure design, site strategy, backup best practices, and more to help you take immediate action.

Get the full guide

Our “Essential Guide to Disaster Recovery Planning” offers a comprehensive framework for designing a DR plan that protects your business across multiple threat vectors.

The four stages to disaster recovery. — Comprehensive DR requires a multi-tiered approach. Your DR strategy should encompass four critical stages: prevention, preparation, mitigation, and recovery.

Choose the right infrastructure: Beyond legacy limitations

Many enterprises still rely on legacy storage technologies like tape, which create delays in restoration and introduce hardware failure risks. Shifting to cloud-first infrastructure reduces these vulnerabilities while unlocking scalability and location diversity. It also supports immutability features—critical for ransomware resilience—and simplifies compliance with evolving regulations.

Cloud platforms also unlock new options for data governance and sovereignty. Enterprises operating across regions or industries governed by strict data residency laws can configure cloud storage to maintain compliance while reducing operational overhead.

As enterprise backup and archive needs grow, it becomes vital to distinguish between long-term cold storage and actively accessible data. With clear infrastructure planning, organizations can streamline operations and ensure faster recovery without overspending on high-performance systems for archival workloads.

What is Object Lock?

Object Lock is the feature in cloud platforms that enables immutability. With immutability, your data cannot be changed, deleted, or encrypted. This is the ultimate protection against ransomware.

DR site temperatures: Hot, warm, or cold?

Depending on your recovery time objective (RTO), different types of recovery sites offer different benefits:

Hot sites: Fully mirrored and ready for instant failover—great for mission-critical apps but expensive.
Warm sites: Pre-configured but not fully live—strike a balance between cost and speed.
Cold sites: Infrastructure is ready but requires manual configuration—most affordable, but slowest to recover.

Enterprises evaluating DR readiness should consider whether their current configuration meets their recovery time goals—and whether they’re optimizing for the right workloads. Comparing hot, warm, and cold site models can help strike the right balance between performance and budget.

Build vs. buy vs. cloud: Finding the right fit

Selecting a DR site is fundamental to your strategy. There are four main approaches to establishing a DR site: building your own, buying services from a co-location provider, buying public cloud storage, or leveraging a disaster recovery as a service (DRaaS) solution. Each approach offers distinct advantages and drawbacks.

Building an on-premises DR site

Pros: It provides complete control over the DR environment, offering greater customization and security.

Cons: Significant upfront investment in hardware, software, and facility infrastructure and management. Requires ongoing maintenance and staffing costs. Limited scalability to accommodate future growth.

Buying co-located DR storage

Pros: It offers a cost-effective alternative to building your own site. Co-location providers manage aspects of the physical infrastructure, reducing your IT team’s workload.

Cons: Less control over the environment compared to an on-premises solution. May require additional investment for network connectivity and configuration. Potential vendor lock-in with specific co-location providers.

Buying public cloud-based DR storage

Pros: Highly scalable and cost-effective. CSPs manage the physical infrastructure, reducing your IT team’s workload. Features like Object Lock help address security concerns versus on-premises storage.

Cons: Retrieving large volumes of data may be slow due to bandwidth constraints.

Buying disaster recovery as a service (DRaaS)

Pros: Highly scalable and cost-effective solution. Eliminates the need for upfront infrastructure investment. DRaaS providers manage the entire DR environment and provide technical support, freeing up your IT staff.

Cons: Reliance on a third-party provider for critical data and infrastructure. Potential concerns over network latency and vendor lock-in. Security considerations require a careful evaluation of the cloud provider’s practices.

Backup vs. replication: Know the difference

Replication copies data in real-time, but that also means it can copy infected or corrupted data. Backups, on the other hand, offer point-in-time recoveries so you can restore data even after a ransomware attack.

This distinction between backups and replication is critical: If you only rely on replication, you could end up replicating the attack itself.

The optimal approach to DR depends on your specific needs.

For frequently accessed data requiring near-instantaneous recovery, consider a combination of hot site methodology and real-time data replication. This offers the fastest failover, but can come at a higher cost.
For critical data with acceptable downtime, a warm site with replicated immutable backups at a secondary location (either on-premises or in the cloud) provides a good balance between cost and recovery time. While requiring some manual intervention, it offers protection against malware replicating to the DR site.
For less critical data or archival purposes, cold storage with periodic backups is a cost-effective option. Backups offer a historical record and are less susceptible to malware infection compared to replicated data, particularly if Object Lock is enabled for immutability.

SaaS outages are a threat you can’t ignore

Although built for high availability, SaaS apps don’t guarantee protection against data loss. Tools like Microsoft 365 and Google Workspace are built for uptime, not recovery. Misconfigurations, insider threats, and accidental deletions remain common risks. Enterprises should take control of their own retention policies with dedicated SaaS backup strategies, including regular point-in-time snapshots and recovery testing.

Additionally, planning for SaaS outages should include identifying local alternatives for core business functions. Can teams temporarily revert to offline workflows? Are key contacts available outside of email or Slack? Defining fallback protocols ensures that productivity doesn’t grind to a halt even if your primary tools go dark.

Assembling your incident response team

The incident response team (IRT) is the backbone of your DR response and is responsible for leading the recovery efforts during a disaster. Here’s a breakdown of possible key IRT roles:

Incident commander: Oversees the entire incident response process, making critical decisions and delegating tasks to team members.
Technical lead: Provides technical expertise, directing recovery efforts for IT infrastructure and data restoration.
Communications lead: Handles external and internal communication, ensuring timely updates for stakeholders and mitigating potential reputational damage.
Documentation lead: Maintains the DR runbook, ensuring its accuracy and updating it with post-incident findings.
Legal counsel: Provides legal guidance and ensures compliance with relevant regulations during the response and recovery process.

Objectives, priorities, and KPIs: The compass of your DR strategy

A robust DR strategy starts with clearly defined objectives and priorities. These guide your approach and decision-making during a disaster recovery event. Your strategy should prioritize rapid recovery of critical systems and applications to minimize operational downtime and resume normal functions swiftly.

Prioritization: Not all data (or systems) are created equal

Prioritizing your critical business applications depends on a deep understanding of your business. Collaborate with internal partners to identify critical business applications that are essential for ongoing operations. Not all applications require immediate restoration. Prioritize systems based on their impact on core business functions.

Documentation is key

A popular mantra for DR specialists is “Test the plan; don’t plan the test.” Your DR plans must be clearly documented as working recipes for application and data recovery, including dependencies and prerequisites. Document the recovery procedures for each critical application, outlining the steps required to bring them back online. This ensures your IT team can efficiently restore essential services during a disaster.

Primary DR objectives

Minimize data loss: The primary objective is to minimize data loss through regular backups and secure storage practices.
Ensure business continuity: The DR plan aims to rapidly recover operation of critical functions during a disaster, minimizing disruption to the business goals.
Optimize costs: Application and data recovery needs to balance speed and costs to ensure recoverability without unnecessarily increasing IT spending.

Compliance considerations

Compliance regulations might influence your DR priorities. Understand any industry-specific regulations or data privacy laws that might dictate specific data protection and recovery timeframes.

Collaborative RTO and RPO setting

Working with internal partners to set RTOs and RPOs ensures alignment across the organization.

Recovery Time Objective (RTO) defines the acceptable timeframe for restoring critical applications to a functional state.
The Recovery Point Objective (RPO) defines the maximum tolerable amount of data loss acceptable in the event of a disaster.

Stakeholders need to understand the realistic trade-offs involved in setting RTOs and RPOs, balancing the need for quick recovery with resource and cost limitations. Achieving extremely short RTOs, such as recovery within minutes, might require substantial investments in advanced infrastructure, redundant systems, and skilled personnel. Setting achievable RTOs and RPOs that effectively balance the need for swift recovery with the financial limitations of the organization requires open communication and collaboration.

Restore vs. recovery: Understanding the nuances

It’s important to distinguish between data restoration and system recovery. Data restoration specifically involves retrieving data from backups. On the other hand, system recovery encompasses the comprehensive restoration of data, applications, configurations, and user accounts to fully restore system functionality.

Your RTOs should focus on the time it takes to bring an application to a usable state, not just the time to recover the data.

Setting expectations

Employees might have unrealistic expectations regarding recovery times during a disaster. Educate the organization on the DR process and the inherent complexities involved.

Developing measurable KPIs

Tracking your progress Key performance indicators (KPIs) are your guiding metric for measuring the effectiveness of your DR strategy. Here are some key DR-related KPIs to consider:

RTO achievement rate: Tracks the percentage of times critical applications are restored within the established RTO.
RPO achievement rate: Measures the percentage of data recovered that meets the defined RPO.
DR plan testing frequency: Monitors how often the DR plan is tested to ensure its effectiveness.
Mean time to recovery (MTTR): Tracks the average time taken to recover critical applications after a disaster.
Data loss rate: Measures the amount of data lost during a disaster compared to the established RPO.

These KPIs provide valuable insights into your DR preparedness and help identify areas for improvement.

Strengthen your RTO and RPO goals with the cloud

Recovery time objectives (RTOs) and recovery point objectives (RPOs) are the backbone of any DR plan. Yet many organizations set unrealistic targets without fully accounting for infrastructure, bandwidth, or cost constraints.

Establishing tiers of RTO and RPO based on data type or application criticality helps organizations avoid overengineering. Not every workload needs sub-hour recovery—archived legal files or marketing collateral may tolerate 24+ hour RTOs. Grouping systems into priority tiers ensures efficient use of budget and infrastructure while keeping SLAs aligned to business risk.

Improving these metrics often comes down to using the right storage architecture. By offloading backup workloads to cost-effective cloud storage with integrated immutability and replication, enterprises can improve RTO and RPO without the overhead of traditional DR environments.

A proactive, iterative approach

A DR plan isn’t a one-time project—it’s a living process that should evolve with the business. Every test, every incident, and every infrastructure change is an opportunity to improve.

Strong DR programs rely on frequent validation, leadership alignment, role clarity, and avoiding common missteps. As IT leaders face new threats and shifting architectures, resiliency comes from readiness—not just recovery.

Testing is everything

Even the most comprehensive DR plans can falter if they aren’t regularly validated. Testing ensures that backup data is restorable, that systems behave as expected under stress, and that team roles are clearly understood.

Testing also gives stakeholders across departments a shared language for discussing DR. Finance understands the cost implications of downtime, Legal sees the impact of non-compliance, and Security can stress-test assumptions about containment and escalation. When testing is multidisciplinary, recovery isn’t just possible—it’s predictable.

Organizations that incorporate routine DR drills and testing into their operations tend to recover faster and more confidently. Effective exercises can include walk-throughs, tabletop simulations, and full-scale failover tests. The goal isn’t just compliance—it’s ensuring the organization can execute when it matters most.

Cost transparency and budgeting for DR

Budget uncertainty often limits the scope and effectiveness of DR plans. Legacy vendors may impose hidden fees for egress, API operations, or early deletion, making it difficult to forecast the total cost of a recovery event. Cloud-native solutions with transparent pricing models allow IT and finance teams to plan confidently.

Establishing a clear TCO framework—including hardware, licensing, testing, and human resources—can help justify DR investments and avoid budget shortfalls when they matter most. DR isn’t just insurance—it’s a measurable part of digital operational excellence.

Final thoughts

Disaster recovery isn’t optional—it’s essential. With threats ranging from cyberattacks to cloud outages, every organization needs a plan that’s tested, documented, and designed for rapid recovery.

Backblaze B2 helps you implement affordable, scalable, and secure DR strategies with:

Immutable backups
Flexible recovery options
Transparent pricing (no egress fees)
Seamless integrations with backup tools like Veeam, MSP360, and more

Download the full ebook, “The Essential Guide to Disaster Recovery Planning,” to get started on your journey to resilience.

The post The Essential Guide to Disaster Recovery: Building Resilience for Your Enterprise appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing Amazon Elastic VMware Service for running VMware Cloud Foundation on AWS

2025-08-05 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/introducing-amazon-elastic-vmware-service-for-running-vmware-cloud-foundation-on-aws/

Today, we’re announcing the general availability of Amazon Elastic VMware Service (Amazon EVS), a new AWS service that lets you run VMware Cloud Foundation (VCF) environments directly within your Amazon Virtual Private Cloud (Amazon VPC). With Amazon EVS, you can deploy fully functional VCF environments in just hours using a guided workflow, while running your VMware workloads on qualified Amazon Elastic Compute Cloud (Amazon EC2) bare metal instances and seamlessly integrating with AWS services such as Amazon FSx for NetApp ONTAP.

Many organizations running VMware workloads on premises want to move to the cloud to benefit from improved scalability, reliability, and access to cloud services, but migrating these workloads often requires substantial changes to applications and infrastructure configurations. Amazon EVS lets customers continue using their existing VMware expertise and tools without having to re-architect applications or change established practices, thereby simplifying the migration process while providing access to AWS’s scale, reliability, and broad set of services.

With Amazon EVS, you can run VMware workloads directly in your Amazon VPC. This gives you full control over your environments while being on AWS infrastructure. You can extend your on-premises networks and migrate workloads without changing IP addresses or operational runbooks, reducing complexity and risk.

Key capabilities and features

Amazon EVS delivers a comprehensive set of capabilities designed to streamline your VMware workload migration and management experience. The service enables seamless workload migration without the need for replatforming or changing hypervisors, which means you can maintain your existing infrastructure investments while moving to AWS. Through an intuitive, guided workflow on the AWS Management Console, you can efficiently provision and configure your EVS environments, significantly reducing the complexity to migrate your workloads to AWS.

With Amazon EVS, you can deploy a fully functional VCF environment running on AWS in a few hours. This process eliminates many of the manual steps and potential configuration errors that often occur during traditional deployments. Furthermore, with Amazon EVS you can optimize your virtualization stack on AWS. Given the VCF environment runs inside your VPC, you have full full administrative access to the environment and the associated management appliances. You also have the ability to integrate third-party solutions, from external storage such as Amazon FSx for NetApp ONTAP or Pure Cloud Block Store or backup solutions such as Veeam Backup and Replication.

The service also gives you the ability to self-manage or work with AWS Partners to build, manage, and operate your environments. This provides you with flexibility to match your approach with your overall goals.

Setting up a new VCF environment

Organizations can streamline their setup process by ensuring they have all the necessary pre-requisites in place ahead of creating a new VCF environment. These prerequisites include having an active AWS account, configuring the appropriate AWS Identity and Access Management (IAM) permissions, and setting up a Amazon VPC with sufficient CIDR space and two Route Server endpoints, with each endpoint having its own peer. Additionally, customers will need to have their VMware Cloud Foundation license keys ready, secure Amazon EC2 capacity reservations specifically for i4i.metal instances, and prepare their VLAN subnet information planning.

To help ensure a smooth deployment process, we’ve provided a Getting started hub, which you can access from the EVS homepage as well as a comprehensive guide in our documentation. By following these preparation steps, you can avoid potential setup delays and ensure a successful environment creation.

Screenshots of EVS onboarding

Let’s walk through the process of setting up a new VCF environment using Amazon EVS.

Screenshots of EVS onboarding

You will need to provide your Site ID, which is allocated by Broadcom when purchasing VCF licenses, along with your license keys. To ensure a successful initial deployment, you should verify you have sufficient licensing coverage for a minimum of 256 cores. This translates to at least four i4i.metal instances, with each instance providing 64 physical cores.

This licensing requirement helps you maintain optimal performance and ensures your environment meets the necessary infrastructure specifications. By confirming these requirements upfront, you can avoid potential deployment delays and ensure a smooth setup process.

Screenshots of EVS onboarding

Once you have provided all the required details, you will be prompted to specify your host details. These are the underlying Amazon EC2 instances that your VCF environment will get deployed in.

Screenshots of EVS onboarding

Once you have filled out details for each of your host instances, you will need to configure your networking and management appliance DNS details. For further information on how to create a new VCF environment on Amazon EVS, follow the documentation here.

Screenshots of EVS onboarding

After you have created your VCF environment, you will be able to look over all of the host and configuration details through the AWS Console.

Additional things to know

Amazon EVS currently supports VCF version 5.2.1 and runs on i4i.metal instances. Future releases will expand VCF versions, licensing options, and more instance type support to provide even more ﬂexibility for your deployments.

Amazon EVS provides flexible storage options. Your Amazon EVS local Instance storage is powered by VMware’s vSAN solution, which pools local disks across multiple ESXi hosts into a single distributed datastore. To scale your storage, you can leverage external Network File System (NFS) or iSCSI-based storage solutions. For example, Amazon FSx for NetApp ONTAP is particularly well-suited for use as an NFS datastore or shared block storage over iSCSI.

Additionally, Amazon EVS makes connecting your on-premises environments to AWS simple. You can connect from on-premises vSphere environment into Amazon EVS using a Direct Connect connection or a VPN that terminates into a transit gateway. Amazon EVS also manages the underlying connectivity from your VLAN subnets into your VMs.

AWS provides comprehensive support for all AWS services deployed by Amazon EVS, handling direct customer support while engaging with Broadcom for advanced support needs. Customers must maintain AWS Business Support on accounts running the service.

Availability and pricing

Amazon EVS is now generally available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), Europe (Ireland), and Asia Pacific (Tokyo) AWS Regions, with additional Regions coming soon. Pricing is based on the Amazon EC2 instances and AWS resources you use, with no minimum fees or upfront commitments.

To learn more, visit the Amazon EVS product page.

Backblaze Drive Stats for Q2 2025

2025-08-05 Drive Stats Team

Post Syndicated from Drive Stats Team original https://www.backblaze.com/blog/backblaze-drive-stats-for-q2-2025/

With hundreds of thousands of hard drives spinning 24/7, our data centers are less like peaceful white-noise oases and more like a a series of obstacle courses—if said obstacle courses were about managing over four exabytes of customer data from archival backups to streaming media to AI training datasets. Sure, they’re obstacle courses we all (and I’m including you, users of the internet) collectively create, but it’s no less of a balancing act to find the contestants (erm, hard drives) that can go the distance.

And we, dear readers, get to watch it all. Welcome to Drive Stats: where failure is inevitable, survival is fascinating, and every quarter brings a new leaderboard.

As of June 30, 2025, we had 321,201 drives under management. Of that total, there were 3,971 boot drives and 317,230 data drives. Stay tuned as we take our standard peek into quarterly and lifetime failure rates, and do a deep dive into the 20TB+ club.

As always, we’ll see you in the comments section. This month, you’ll also get three (count ‘em, three!) opportunities to talk to us in person as well—virtually at our Drive Stats LinkedIn Live on August 5 (today), or twice in Las Vegas at DefCon on August 7 and 8.

Sign up for the Drive Stats LinkedIn Live

Ready to dive deeper into the data? Tune in today at 10:00 a.m. PT, to query the Drive Stats team, Stephanie Doyle and Pat Patterson. We’ll see you there!

Drive Stats by the numbers: The digest version

An infographic summarizing key data points in this report, including drive count, drive failures, drive days, drive population by manufacturer, and a summary of the quarterly, annual, and lifetime AFRs.

Q2 2025 hard drive failure rates

For those that are new to the Drive Stats report, it’s worth mentioning that we have certain criteria that we use to select drives for consideration each quarter. We’ll discuss those in the next section, but for now, let’s talk about the data. The table below shows the failure rates for Q2 2025.

Backblaze Hard Drive Failure Rates for Q2 2025

Reporting period April 1, 2025–June 30, 2025 inclusive
Drive models with drive count > 100 as of June 30, 2025 and drive days > 10,000 in Q2 2025

Notes and observations

The annual failure rate is lower this quarter. We had some major fluctuations last quarter. Quoting ourselves from May 2025:

The quarterly failure rate is slightly higher. The quarterly failure rate went up from 1.35% to 1.42%. As with the zero-failure club, our higher-end outlier AFRs show some of the usual suspects:

We’re now back down to 1.36%. What’s changed?

Big swings in our higher-end failure rates: Well, some of the drives with higher failure rates have come down quite a bit. Notably, that includes the 12TB Seagate model ST12000NM0007, which was at a whopping 9.47% failure rate last quarter—down this quarter to only 3.58%. With its drive count holding more or less steady (1,038 in Q1 and 1,014 in Q2), that means a real change in failure rates. Note that this drive was at 8.72% in Q4 2024, so it’s worth keeping an eye on whether this is a fluke or a new pattern. Other significant drops include the 12TB HGST model HUH721212ALN604 (Q1: 4.97%; Q2: 3.39%) and the 14TB Seagate model ST14000NM0138 (Q1: 6.82%, Q2: 4.37%).
One new drive model on the way in: Welcome to the party, Toshiba MG09ACA16TE (16TB).
Zero failures for the quarter: Rising to the top, we’ve got only two this time around:
- Seagate ST8000NM000A (8TB)
- Seagate ST16000NM002J (16TB)

That 8TB Seagate is really shining, given this is its third quarter running with zero failures.

Bonus: One failure drives: Since we only have two 0 failures (and that just seems a little lackluster, doesn’t it?), it’s also worth mentioning the drives with only one failure this quarter:
- HGST HMS5C4040BLE640 (4TB)
- Seagate ST12000NM000J (12TB)
- Seagate ST14000NM000J (14TB)
- Toshiba MG09ACA16TE (16TB)

Drive model criteria

We noted earlier we removed 495 drives from consideration when we produced the table above covering Q2 2025. There are two primary reasons we did not consider these drive models.

Testing. These are drives of a given model that we monitor and collect Drive Stats data on, but are not considered production drives at this time. For example, drives undergoing certification testing to determine if they are performant enough for our environment are not included in our Drive Stats calculations.
Insufficient data points. When we calculate the annualized failure rate for a drive model for a given period of time (quarterly, annual, or lifetime), we want to ensure we have enough data to reliably do so. Therefore we have defined criteria for a drive model to be included in the tables and charts for the specified period of time. Models that do not meet these criteria are not included in the tables and charts for the period in question.

A table that outlines the drive inclusion parameters for each type of Backblaze Drive Stats report.

Regardless of whether or not a given drive model is included in the charts and tables, all of the data for all of the drives we use is included in our Drive Stats dataset which you can download by visiting our Drive Stats page.

As with the Q2 quarterly results, we will apply these criteria to the lifetime charts that follow in this report.

Lifetime hard drive failure rates

To be considered for the lifetime review, a drive model was required to have 500 or more drives as of the end of Q2 2025 and have over 100,000 accumulated drive days during their lifetime. When we removed those drive models which did not meet the lifetime criteria, we had 393,907 drives grouped into 27 models remaining for analysis as shown in the table below.

Backblaze Hard Drive Failure Rates for Q2 2025

Reporting period ending June 30, 2025
Drive models > 500 drives and > 100,000 lifetime drive days

A table showing the lifetime Backblaze Drive Stats.

Notes and observations

Again, the lifetime AFR holds steady, dropping from Q1 2025’s 1.31% to 1.30%.

Now you see me: This quarter’s table also gives us an interesting snapshot that has to do with our drive exclusions as the 4TB HGST model HMS5C4040ALE640 is on the way out. It meets our lifetime drive criteria, so it is included in this second table, but it didn’t make the cut for the quarterly table because it had too few drives running by the end of the quarter. Usually you see the opposite, where drive models show up in the quarterly requirements but not the lifetime. This quarter, four models meet that standard (Seagate model numbers ST8000NM000A, ST14000NM000J, ST16000NM002J, and Toshiba MG09ACA16TE).
Smaller drives getting older: Perhaps an unsurprising trend—Backblaze’s smaller capacity drives are getting older. We have a total of 13 drive models with 12TB or less, with a collective 1.54% failure rate. See the table below:

Backblaze drives with ≤12TB capacity

A image showing drives that are less than or equal to 12TB, including color coding to indicate drive age.

Of those models, eight are five years old or older (shown in purple). An additional two models are four years or older (that’s your orange). Taking just these 10 models—drives reaching their supposed golden years—we have a collective AFR of 1.42%.

Notably, that AFR is due to some well-performing low-failure outliers, including both of the 4TB Seagate models (0.57% and 0.40%), the 12TB HGST model HUH721212ALE600 (0.56%), and the 12TB Seagate model ST12000NM001G (0.99%).

That said, it’s also perhaps more impressive that when we say “eight are five years and older,” of those eight drive models, five are six or more years old. Their collective AFR is 1.33%.

Drive models that are less than or equal to 12TB and that are 6 or more years old.

This begs the age-old question: Is age just a number? Or, are we just seeing several exceptional drive models? In any event—an interesting drive population to keep an eye on, as it represents 156,724 of our 393,907 (~40%) of the lifetime drive pool.

Zoom in: The 20TB+ club

We’ve been taking quick peeks at the 20TB+ drives in the last few reports, but it’s high time we dig in a bit deeper. Right now, our cohort of 20TB+ drives that meet the lifetime criteria consists of three drives, the 20TB Toshiba model MG10ACA20TE, 22TB WDC model WUH722222ALE6L4, and 24TB Seagate model ST24000NM002H. Quite neatly, that also gives us one per manufacturer, lending itself to something of a head-to-head comparison—though, of course, with the variability we see on a per-drive basis within the same manufacturer, we won’t over-index on lending it too much significance.

Let’s take a look at each.

20TB Toshiba MG10ACA20TE

The Toshiba has actually been in our drive pool for 22 months, but until just under a year ago, there were only two drives. For the purposes of significance, then, we’ll exclude significantly low numbers of drives—thankfully, each model has something of a natural fall-off point where they go from single-digit drive numbers to hundreds.

For the Toshiba, that gives us the following data:

A table showing the AFRs for the 20TB Toshiba based on their age.

Converted to a graph, we end up with the following:

A graph showing the failure rates based on age for the 20TB Toshiba drives.

On this graph, the blue line represents the AFR and the red line represents the drive count. Drive count can be a bit tricky since our x-axis is age, and we start with age=0, which means that the drive count (from our perspective) goes from larger to smaller. That is, as drives get older, there are fewer of them by count—you have your initial purchase cohort, then you add drives over time. You can read this as the first data point representing drives that are between 0–1 month old, the next data point as 1–2 months old, etc.

We set it up this way because we wanted to be able to directly compare the failure rates of the drives based on their ages. Those familiar with our bathtub curve analysis may recognize our methodology here—we’re just zooming in on specific drives and drive capacities.

22TB WDC WUH722222ALE6L4

Now let’s take a look at the WDC model. We have usable data for about 21 months of its drive life:

A table showing the AFRs for the 22TB WDC drives based on their age.

Which gives us the following visualization:

A graph showing the AFRs for the 22TB WDC drives based on their age.

Interestingly, we see a lot less variability in the span of time where we have a direct comparison. That said, the WDC model also had a minimum of double the drive count if we’re looking at a similar time period—so, at their youngest (0 months old) the Toshiba had 14,407 drives vs. WDC’s 37,363; and, at 11 months Toshiba had 1,034 drives vs. WDC’s 13,965.

While AFRs do get us mostly on an even playing field as far as being able to make a 1:1 comparison, it’s important to remember that in smaller drive pools, a single failure can be amplified by quite a bit.

24TB Seagate ST24000NM002H

Our youngest drive model, the 24TB Seagate ST24000NM002H, has just half a year of data.

A table showing the AFRs for the 24TB Seagate drives based on their age.

That gives us the following visualization:

A graph showing the AFRs for the 24TB Seagate drives based on their age.

Compared with our other two drive models, the 24TB Seagate definitely has the highest failure rates. This could be explained, in part, by it being a young drive—is it in the leading edge of a traditional bathtub curve? So, certainly something to track over time to see if it will settle out as it gets older.

All together now: Comparing each 20TB+ drives

We designed this view to be directly comparable at points in time, so, here’s your graph that puts each drive on the same time scale:

A graph comparing the AFRs for the 20TB+ drives based on their age.

What’s our takeaway here? Well, in both drive count and length of time in the pool, it’s a little early to create definitive trends for the Seagate and the Toshiba. Certainly we can see that the Seagate is, early on, showing higher failure rates. Meanwhile, the 20TB Toshiba has had a bit of a variable year one. But again, with significantly variable drive pools between all models, we’re not quite comparing apples to apples. (We chose not to plot drive count on this chart—it gets messy quickly.) Add to that: the Seagate in particular is potentially at the beginning of the “bathtub” curve, we may see it change over time.

On the other hand, the 22TB WDC model has shown up quite a bit below our current average AFR for the drive pool of all drive sizes and ages, and it’s the model with the most data. But, how does that compare to other models as they come online?

Comparison: 20TB+ as a pool vs. 14-16TB as a pool

When we were considering whether this information would be a useful slice of the data, our biggest question was how to contextualize it versus drives. It’s perhaps a tad imperfect, but we landed on combining the 14–16TB drives as a pool, largely because they have a significant amount of data points and were our last set of drives onboarded, which means that they’re more or less the last generation of hardware.

The other thing to note is that once we combined the 20TB drives into a pool, some of the data we filtered out on a per-drive basis got added back in. So, at the 21 month mark, where the Toshiba model only had one drive, we added that single drive to the 399 that our WDC model brought to the table and calculated AFR across the pool (giving us 400 drives to work with).

Here’s the numbers for the 20TB+ drive pool:

A table combining the AFRs for the 20TB+ drives based on their age.

That gives us a pretty neat graph, actually:

A graph combining the AFRs for the 20TB+ drives based on their age.

Now, let’s compare to the 14–16TB drives of the same age. We have significant data for this population for nearly seven years, but in the interest of saving you three pages of scrolling, I’ll give you the table for the data that directly correlates to the 21 months of data we have for the 20TB+ drives.

A table combining the AFRs for the 14-16TB drives based on their age.

This is the line chart for that range of time:

A graph combining the AFRs for the 14-16TB drives based on their age, showing only the time period from 0-21 months.

Comparing age of drive to age of drive, it would seem that our 20TB are right on target, and perhaps doing a bit better than expected. But, that definitely isn’t a perfect comparison given that the 14–16TB drives have a steadier and larger drive count. So, let’s look at the chart with the full, nearly seven year time period:

A graph combining the AFRs for the 14-16TB drives based on their age, showing only the full age of the drive pool, 81 months.

This view starts to show us some spiky patterns as the 14–16TB drives get older, of course exacerbated by drive counts reducing over time.

So what’s it all mean?

It’s clear from the data that we need to give the 20TB+ drives time to mature, and that as we (depending on our buying behavior, of course) add more drives, we might see some interesting changes in the data.

As for the 14–16TB pool, they’re following relatively expected patterns of wearing out in the five-plus year range—but what does that mean when you compare to what we observed in our current lifetime stats, where we see our 12TB and smaller drive pool performing so well?

Without taking a closer look at the 14–16TB drives, it’s hard to say that they don’t have similar outlier tendencies to what the 12TB and smaller pool does, just pulling the failure rates upward. Even a casual glance at our current lifetime table’s 14–16TB drives bears that out (four years and older highlighted in orange, as our corollary above):

A image showing drives that are 14-16TB drives, including color coding to indicate drive age.

That data isn’t inclusive of all of the 14–16TB drives we’ve ever had, though—just those currently in operation. So, as always, there’s more investigation to be done.

The Hard Drive Stats data

The complete dataset used to create the tables and charts in this report is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data itself to anyone; it is free.

Good luck, and let us know if you find anything interesting.

The post Backblaze Drive Stats for Q2 2025 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Cloud Storage Myths Debunked: Hyperscaler Storage Is Good Enough for Cloud-Native Apps

2025-07-31 David Johnson

Post Syndicated from David Johnson original https://www.backblaze.com/blog/cloud-storage-myths-debunked-hyperscaler-storage-is-good-enough-for-cloud-native-apps/

The big cloud providers already offer everything you need, including storage. So, why complicate things, right?
At first glance, that sounds convincing. Hyperscalers like AWS, Azure, and Google Cloud offer massive service catalogs, global infrastructure, and a wide range of storage options. For many teams, they seem like a convenient one-stop shop.

But in practice, things aren’t nearly as straightforward.

While hyperscalers offer extensive storage capabilities, their multi-tier systems prioritize versatility over optimization. The result? Hidden costs and performance headaches that cloud-native teams can’t afford to ignore.

The claim that hyperscaler storage meets all cloud-native needs because of scale and functionality is a stubborn myth, one of many that still permeate the development landscape.

This post kicks off a blog series tackling these myths and misconceptions about specialized cloud storage and what a best-of-breed, interoperable approach to storage and infrastructure entails.

New Cloud Native Times Call for New Cloud Storage Approaches

Learn more about how the open cloud supports faster development, improved workflows, and reduced cost complexity in our free ebook, “New Cloud Native Times Call for New Cloud Storage Approaches.”

Reading the fine print of hyperscaler storage

On the surface, hyperscaler storage looks comprehensive and capable. But dig a little deeper, and some underlying cracks start to show.

Premium performance isn’t the default

Hyperscalers can deliver high performance, but not without tradeoffs:

They charge more. Premium tiers designed for workloads like analytics or streaming can cost five to eight times more than interoperable solutions.

They prioritize themselves. When hyperscalers face high-performance demands (e.g., AI workloads competing for GPUs and storage bandwidth), they tend to prioritize their own data centers. Smaller teams might have to navigate opaque processes to request higher performance, and their access to advanced optimizations can be limited.

They play favorites. File size adds yet another layer of difficulty. Many hyperscaler storage systems handle large files more efficiently than small ones because of I/O overhead. Hyperscalers may help their biggest customers fine-tune configurations, but most are left to troubleshoot bottlenecks on their own.

Juggling tiers (and hoping nothing gets dropped)

Hot, cool, and cold storage options may look flexible on paper, but they require separate access controls, replication rules, and performance tuning. Teams are left juggling interfaces like AWS Identity and Access Management (IAM), scripting policies, and managing tooling just to keep systems functional.

And the more storage types you manage, the greater the chance for human error. A misplaced lifecycle rule or a mistyped IAM permission can result in:

Unexpected data unavailability
Delayed retrievals
Accidental deletions

When complexity undermines reliability

Keeping storage tied tightly to hyperscaler infrastructure may seem efficient—but it often results in brittle setups. Misaligned storage, compute, and access layers can lead to latency issues or even full-blown downtime.

Performance-sensitive applications like real-time analytics or video streaming suffer most. Even a small delay can ripple through the user experience and cause customer churn. To patch gaps, teams often layer on caches, fine-tuning, or quick fixes that only add technical debt.

Who has time to babysit storage?

Developers, DevOps, and site reliability engineers (SREs) are always racing to ship features, scale services, and maintain uptime. For cloud-native teams, optimizing storage isn’t usually at the top of anyone’s to-do list.

Let’s face it: proactively analyzing storage access patterns and configuring tiering rules takes time that cloud-native teams often don’t have. Many teams therefore operate reactively and address storage issues only after performance degrades or surprise bills arrive.

Support tickets don’t feel your pain

Finally, there’s support. Unless you’re a premium customer paying for top-tier service contracts, you’re often stuck with ticketing systems and community forums. That might suffice for routine issues, but when storage problems impact production workloads, waiting for responses through standard channels adds unnecessary stress and delays.

When one size doesn’t fit your cloud

Unlike hyperscaler storage that takes a one-size-fits-all approach, specialized cloud storage solutions directly tackle these challenges. Backblaze B2 is purpose-built to simplify storage for cloud-native teams:

A single, high-performance tier gives you instant access to all your data, with no tier juggling or lifecycle policies.

Predictable, transparent pricing means no unexpected fees or surprise retrieval charges.

S3-compatible APIs simplify integration, allowing you to plug Backblaze B2 directly into your existing cloud-native stack.

For cloud-native teams who value speed, simplicity, and cost control, specialized storage isn’t a complication; it’s a simplification. You get the performance you need, without the complexity you don’t.

Stay tuned for the next post in this series where we tackle Myth #2: Storage isn’t a big enough problem to remediate. (Spoiler: It is.)

The post Cloud Storage Myths Debunked: Hyperscaler Storage Is Good Enough for Cloud-Native Apps appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Leveling Up Security: New Enterprise Features in Backblaze B2 Platform Update

2025-07-29 Kari Rivas

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/leveling-up-security-new-enterprise-features-in-backblaze-b2-platform-update/

A decorative header showing a laptop with several icons, including files, a warning signal, and others.

Security teams are under constant pressure to stay ahead of increasingly sophisticated threats while enabling fast, reliable access to data across the business. Whether you’re protecting media assets, safeguarding backups, or supporting a global development workflow, your cloud storage needs to do more than store data—it needs to actively support your security posture.

To make that easier, we’ve launched a new set of enterprise-grade security features for Backblaze B2 Cloud Storage. These updates are designed to help organizations detect unusual activity faster, manage access more precisely, and strengthen visibility across their storage environments—all without added complexity or hidden costs.

These new tools build on the security foundations you already count on: Object Lock for ransomware protection, SOC-2 compliance, encryption, 3x free egress for disaster recovery, and more.

Here’s a look at what’s new and how it helps.

Smarter protection with Anomaly Alerts (Now in private preview)

Anomaly Alerts are your new AI-powered watchdog. This feature analyzes usage patterns in your B2 Cloud Storage buckets to detect potential red flags—like spikes in downloads or uploads beyond the baseline—that could signal a breach or exfiltration attempt.

If your team wants early access to this feature, drop us a line at [email protected] to join the private preview.

New enterprise web console & role-based access controls (Now in private preview)

Managing cloud storage across large teams just got a whole lot easier. We introduced a brand-new enterprise web console built for scalability and control. Combined with robust role-based access controls (RBAC), IT and security teams can now better align with zero-trust policies by enforcing the principle of least privilege across their organizations.

This console simplifies storage administration at scale—whether you’re managing terabytes or petabytes.

Get an expert introduction to the enterprise web console.

If you’re a Backblaze customer with a committed contract, reach out to your Customer Success Manager (CSM) to see if you’re eligible to participate. Not sure who your CSM is? Email [email protected] for help.

Full visibility with Bucket Access Logs (Now generally available)

Need to know who touched what and when? Bucket Access Logs are now generally available, providing a detailed audit trail of every action in your B2 buckets—uploads, downloads, deletions, and more.

Learn more about querying Bucket Access Logs in this webinar.

They’re fully S3-compatible and configurable through both the Backblaze B2 web UI and API, supporting:

Security audits
Usage tracking
Forensics and threat investigation

Real-time Event Notifications

Time matters when it comes to spotting and stopping threats. With Event Notifications, you can get real-time alerts on changes to your bucket contents—think object creations, deletions, or modifications—so your team can jump into action immediately.

This is especially valuable for compliance teams, incident response workflows, or any operations team who wants tighter control over their cloud perimeter.

Watch our hands-on Event Notifications demo to learn more about how to streamline cloud storage management.

Multi-Bucket and Scalable Application Keys

Security and scalability should go hand in hand. With Multi-Bucket Application Keys, you can now create access keys that cover specific groups of buckets, giving you more flexibility without going full wildcard. This enhancement provides more granular control over bucket access, contributing to a reduced attack surface.

And, with Scalable Application Keys, you can generate up to 10,000 short-lived keys per minute. This capability enhances security by limiting the exposure window of individual keys, thus reducing the attack surface for endpoint-generated content and high-volume data operations.

Custom Upload Timestamps

Custom Upload Timestamps allow you to specify when an object was originally created or uploaded. This is a critical feature for:

Regulatory compliance
Accurate version tracking
Legal and audit requirements

Built for a Secure, Open Cloud

Security isn’t a one-time add-on, it’s an ongoing promise. As enterprises scale and integrate cloud storage into more parts of their workflow—from backup and archiving to AI pipelines—they need solutions that support open cloud strategies without compromising their data.

This update is another step forward in our mission to provide developers, IT teams, and enterprises with cloud storage that’s secure by design, simple to use, and affordable at scale. Ready to get started with Backblaze B2? Contact our Sales team today.

The post Leveling Up Security: New Enterprise Features in Backblaze B2 Platform Update appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Legal Hold Is Here: Protect Your Business When It Matters Most

2025-07-24 Natasha Rabinov

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/legal-hold-is-here-protect-your-business-when-it-matters-most/

A decorative image showing several lock icons.

Whether you’re navigating HR issues, facing down litigation, or ensuring operational readiness in the face of uncertainty, you need to be ready to preserve your data. When the stakes are high, Legal Hold, a new feature in Backblaze Computer Backup with Enterprise Control, can help you stay ready.

Available today, Legal Hold gives administrators the power to preserve every version of a user’s backup with a single click. No extra hardware, no new software—all at the same flat-rate pricing of Backblaze Computer Backup with Enterprise Control.

Let’s dig into what Legal Hold is, its importance, and how Backblaze implements it to meet enterprise needs.

What is Legal Hold?

A legal hold, also known as a litigation hold, is a process that organizations use to preserve electronically stored information (ESI) when they face actual or anticipated litigation, audits, or investigations. It ensures that relevant data—such as emails, documents, and file backups—is not deleted, altered, or lost. Once enabled, Backblaze Computer Backup’s Legal Hold feature will preserve a user’s entire backup, including every historical version captured, with a single click.

A legal hold is typically triggered when an organization becomes aware of a legal claim or regulatory inquiry. Once in place, normal data retention policies are suspended for any affected data, ensuring it remains available for legal review.

How Backblaze Legal Hold helps you stay protected

At Backblaze, we’ve designed our Legal Hold for Computer Backup feature to be powerful, simple, and reliable. Here’s how it works:

Instant activation: Instantly activate Legal Hold in the Enterprise Control console without additional hardware or software.
Automated data preservation: Apply a Legal Hold to any user’s backup directly from your admin console. The backups are preserved in a fixed state, meaning no files can be altered or deleted—even by retention policies.
Remote and silent enforcement: Legal Holds are applied remotely without disrupting the user’s work, alerting the users, or requiring their involvement. It runs silently in the background without downtime, throttling, or notifications.
Retention beyond the device: Even if the original device is lost, stolen, or wiped, all held data remains safely stored in Backblaze.
Secure by default: Encryption at rest and in transit with optional private key encryption available keeps data safe.

Why Legal Hold matters in 2025

In today’s landscape, Legal Hold isn’t just a “nice to have.” It’s a must-have for almost every organization:

Rising litigation and audits: Businesses face more legal scrutiny than ever—whether it’s an employee dispute, intellectual property (IP) protection, or a customer complaint.
Remote and hybrid workforces: With data scattered across devices and locations, you need a solution that protects endpoint backups no matter where the user is.
Cybersecurity incidents and data loss: Legal Hold ensures that even during a ransomware attack or internal breach, copies of critical data are preserved for investigation or recovery.
Cloud-first operations: Legal Hold needs to work where your data lives—securely in the cloud, always ready when you are.

Ready when you need it most

Now, any business using Backblaze Computer Backup with Enterprise Control can implement Legal Hold in just a few clicks—making it easier than ever to stay compliant, reduce legal risk, and prepare for the unexpected.

Already a customer? You can start using Legal Hold today. See our docs article or log in to your admin console.

Not yet on Backblaze? Reach out to our Sales team to start a free 15-day trial.

The post Legal Hold Is Here: Protect Your Business When It Matters Most appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Architecting Your AI Data Pipeline Using B2 Overdrive

2025-07-22 Jeronimo De Leon

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/architecting-your-ai-data-pipeline-using-b2-overdrive/

A decorative image showing cloud storage and AI icons.

When you think about cloud infrastructure for AI, you immediately think of GPUs and other high-performance compute resources, and how your cloud architecture should be optimized to make the most of these expensive compute plans. But compute isn’t the only cloud product category you need to monitor to both scale your application and maintain a sustainable cloud infrastructure budget.

What ultimately fuels AI? Data—lots and lots of data. As part of a healthy AI pipeline, several versions of the same dataset need to be stored in a centralized repository, or multiple repositories if your strategy requires splitting data into cold vs. hot storage to reduce storage costs. For text-based LLMs, storage costs are minimal compared to compute resources. But as AI innovation increasingly relies on video and other media, both the base storage cost and data retrieval fees can make cloud bills spiral out of control.

In this blog, we’re taking a look at the AI data pipeline, where object storage sits in each stage, and how leveraging both Backblaze B2 and B2 Overdrive helps both increase performance and reduce costs for AI applications.

AI data pipeline stages

There are five key AI data pipeline stages where data retrieval and overall performance is critical—and this performance starts with your designated data storage backend.

Data ingest and active archive: Data is gathered from multiple designated sources (including APIs, internet of things (IoT) sensors, relational databases, etc.) and ingested into a centralized repository or multiple repositories.
Data processing: The raw data is transformed and enriched based on the model’s data parameters. This can range from relatively simple text cleanup to adding annotations and metadata. Feature engineering is performed to extract or construct meaningful attributes. All data is then converted into numerical representations (e.g., embeddings, vectors) suitable for model training and inference.
Model experimentation and training: Processed data is used to train models by learning underlying patterns. Iterative experiments in a test environment evaluate, tune, and improve model performance and accuracy.
Model deployment and inference: New data is prepared in the same way as during training and sent to the deployed model to generate predictions, support decision-making, and deliver personalized outputs.
Monitoring: Continuous monitoring tracks model performance, detects data drift, and flags potential bias, ensuring the model remains accurate and reliable over time.

Keep in mind that data ingestion and processing isn’t always sequential, such as when data is collected and ingested, but corruption is detected during processing. Ideally, your pipeline is configured with validation gates so that corrupt data is identified and handled before proceeding to downstream steps like testing, training, and production deployment.

When using cloud object storage as your data repository, one factor of selecting a plan (like cold versus hot storage) is the specific type of data ingestion that’s being utilized based on both the data source and AI model’s specific needs.

Batch ingestion is better suited for mid to lower performance storage, as this is typically used for historical datasets or a set schedule of pre-determined data updates, such as jobs pulling from relational databases or CSV uploads once a day or once per week.
Streaming ingestion is well-suited for hot storage to support a continuous stream of real-time (or near-real-time) data processing, such as from social media feeds and high-volume e-commerce AI helper agents.
Hybrid ingestion uses a combination of batch and streaming ingestion to handle both historical and real-time data requirements for AI models.

Where does cloud object storage sit in the AI data pipeline?

Everywhere. All scalable data pipelines lead to object storage.

Why? Data ingestion and active archive are the major areas where object storage fulfills an important purpose. When training AI models, especially in production, data scalability for multiple and diverse data types is a hard requirement. But object storage plays a key role in the other pipeline stages:

Data processing: Stores versioned outputs from data labeling, feature engineering, and cleaning processes.
Model experimentation and training: Provides high-throughput access to training datasets and stores model checkpoints.
Model deployment and inference: Stores serialized model artifacts with API-based retrieval for serving predictions at scale.
Monitoring: Stores synthetic outputs from generative models, logs, feedback, and performance metrics for analysis and reuse.

For both AI data performance and cost optimization, selecting an object storage product or tier is far from one-size-fits-all. You can strategically allocate your data to B2 Cloud Storage or B2 Overdrive, with your most essential model data stored in B2 Overdrive. Here’s a high-level diagram of what Backblaze B2 product to use for each stage, including examples of the data stored at each stage.

Learn more at Ai4 in August

Want to learn more? Backblaze is heading to Las Vegas for Ai4 August 11–13! In addition to booking a meeting to speak with our storage experts and stopping by our booth to pick up some swag, I’m excited to talk more about the AI data pipeline during my talk. If you’re attending Ai4, add The AI Pipeline Starts with Storage: Architecting Scalable Data Foundations to your conference agenda.

Can’t attend live in Vegas? Reach out to our Sales team to talk about your specific use case and how B2 Overdrive can help propel your data.

The post Architecting Your AI Data Pipeline Using B2 Overdrive appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Simplify serverless development with console to IDE and remote debugging for AWS Lambda

2025-07-17 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/simplify-serverless-development-with-console-to-ide-and-remote-debugging-for-aws-lambda/

Today, we’re announcing two significant enhancements to AWS Lambda that make it easier than ever for developers to build and debug serverless applications in their local development environments: console to IDE integration and remote debugging. These new capabilities build upon our recent improvements to the Lambda development experience, including the enhanced in-console editing experience and the improved local integrated development environment (IDE) experience launched in late 2024.

When building serverless applications, developers typically focus on two areas to streamline their workflow: local development environment setup and cloud debugging capabilities. While developers can bring functions from the console to their IDE, they’re looking for ways to make this process more efficient. Additionally, as functions interact with various AWS services in the cloud, developers want enhanced debugging capabilities to identify and resolve issues earlier in the development cycle, reducing their reliance on local emulation and helping them optimize their development workflow.

Console to IDE integration

To address the first challenge, we’re introducing console to IDE integration, which streamlines the workflow from the AWS Management Console to Visual Studio Code (VS Code). This new capability adds an Open in Visual Studio Code button to the Lambda console, enabling developers to quickly move from viewing their function in the browser to editing it in their IDE, eliminating the time-consuming setup process for local development environments.

The console to IDE integration automatically handles the setup process, checking for VS Code installation and the AWS Toolkit for VS Code. For developers that have everything already configured, choosing the button immediately opens their function code in VS Code, so they can continue editing and deploy changes back to Lambda in seconds. If VS Code isn’t installed, it directs developers to the download page, and if the AWS Toolkit is missing, it prompts for installation.

To use console to IDE, look for the Open in VS Code button in either the Getting Started popup after creating a new function or the Code tab of existing Lambda functions. After selecting, VS Code opens automatically (installing AWS Toolkit if needed). Unlike the console environment, you now have access to a full development environment with integrated terminal – a significant improvement for developers who need to manage packages (npm install, pip install), run tests, or use development tools like linters and formatters. You can edit code, add new files/folders, and any changes you make will trigger an automatic deploy prompt. When you choose to deploy, the AWS Toolkit automatically deploys your function to your AWS account.

Screenshot showing Console to IDE

Remote debugging

Once developers have their functions in their IDE, they can use remote debugging to debug Lambda functions deployed in their AWS account directly from VS Code. The key benefit of remote debugging is that it allows developers to debug functions running in the cloud while integrated with other AWS services, enabling faster and more reliable development.

With remote debugging, developers can debug their functions with complete access to Amazon Virtual Private Cloud (VPC) resources and AWS Identity and Access Management (AWS IAM) roles, eliminating the gap between local development and cloud execution. For example, when debugging a Lambda function that interacts with an Amazon Relational Database Service (Amazon RDS) database in a VPC, developers can now debug the execution environment of the function running in the cloud within seconds, rather than spending time setting up a local environment that might not match production.

Getting started with remote debugging is straightforward. Developers can select a Lambda function in VS Code and enable debugging in seconds. AWS Toolkit for VS Code automatically downloads the function code, establishes a secure debugging connection, and enables breakpoint setting. When debugging is complete, AWS Toolkit for VS Code automatically cleans up the debugging configuration to prevent any impact on production traffic.

Let’s try it out

To take remote debugging for a spin, I chose to start with a basic “hello world” example function, written in Python. I had previously created the function using the AWS Management Console for AWS Lambda. Using the AWS Toolkit for VS Code, I can navigate to my function in the Explorer pane. Hovering over my function, I can right-click (ctrl-click in Windows) to download the code to my local machine to edit the code in my IDE. Saving the file will ask me to decide if I want to deploy the latest changes to Lambda.

Screenshot view of the Lambda Debugger in VS Code

From here, I can select the play icon to open the Remote invoke configuration page for my function. This dialog will now display a Remote debugging option, which I configure to point at my local copy of my function handler code. Before choosing Remote invoke, I can set breakpoints on the left anywhere I want my code to pause for inspection.

My code will be running in the cloud after it’s invoked, and I can monitor its status in real time in VS Code. In the following screenshot, you can see I’ve set a breakpoint at the print statement. My function will pause execution at this point in my code, and I can inspect things like local variable values before either continuing to the next breakpoint or stepping into the code line by line.

Here, you can see that I’ve chosen to step into the code, and as I go through it line by line, I can see the context and local and global variables displayed on the left side of the IDE. Additionally, I can follow the logs in the Output tab at the bottom of the IDE. As I step through, I’ll see any log messages or output messages from the execution of my function in real time.

Enhanced development workflow

These new capabilities work together to create a more streamlined development experience. Developers can start in the console, quickly transition to VS Code using the console to IDE integration, and then use remote debugging to debug their functions running in the cloud. This workflow eliminates the need to switch between multiple tools and environments, helping developers identify and fix issues faster.

Now available

You can start using these new features through the AWS Management Console and VS Code with the AWS Toolkit for VS Code (v3.69.0 or later) installed. Console to IDE integration is available in all commercial AWS Regions where Lambda is available, except AWS GovCloud (US) Regions. Learn more about it in Lambda and AWS Toolkit for VS Code documentation. To learn more about remote debugging capability, including AWS Regions it is available in, visit the AWS Toolkit for VS Code and Lambda documentation.

Console to IDE and remote debugging are available to you at no additional cost. With remote debugging, you pay only for the standard Lambda execution costs during debugging sessions. Remote debugging will support Python, Node.js, and Java runtimes at launch, with plans to expand support to additional runtimes in the future.

These enhancements represent a significant step forward in simplifying the serverless development experience, which means developers can build and debug Lambda functions more efficiently than ever before.

Accelerate safe software releases with new built-in blue/green deployments in Amazon ECS

2025-07-17 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/accelerate-safe-software-releases-with-new-built-in-blue-green-deployments-in-amazon-ecs/

While containers have revolutionized how development teams package and deploy applications, these teams have had to carefully monitor releases and build custom tooling to mitigate deployment risks, which slows down shipping velocity. At scale, development teams spend valuable cycles building and maintaining undifferentiated deployment tools instead of innovating for their business.

Starting today, you can use the built-in blue/green deployment capability in Amazon Elastic Container Service (Amazon ECS) to make your application deployments safer and more consistent. This new capability eliminates the need to build custom deployment tooling while giving you the confidence to ship software updates more frequently with rollback capability.

Here’s how you can enable the built-in blue/green deployment capability in the Amazon ECS console.

You create a new “green” application environment while your existing “blue” environment continues to serve live traffic. After monitoring and testing the green environment thoroughly, you route the live traffic from blue to green. With this capability, Amazon ECS now provides built-in functionality that makes containerized application deployments safer and more reliable.

Below is a diagram illustrating how blue/green deployment works by shifting application traffic from the blue environment to the green environment. You can learn more at the Amazon ECS blue/green service deployments workflow page.

Amazon ECS orchestrates this entire workflow while providing event hooks to validate new versions using synthetic traffic before routing production traffic. You can validate new software versions in production environments before exposing them to end users and roll back near-instantaneously if issues arise. Because this functionality is built directly into Amazon ECS, you can add these safeguards by simply updating your configuration without building any custom tooling.

Getting started
Let me walk you through a demonstration that showcases how to configure and use blue/green deployments for an ECS service. Before that, there are a few setup steps that I need to complete, including configuring AWS Identity and Access Management (IAM) roles, which you can find on the Required resources for Amazon ECS blue/green deployments Documentation page.

For this demonstration, I want to deploy a new version of my application using the blue/green strategy to minimize risk. First, I need to configure my ECS service to use blue/green deployments. I can do this through the ECS console, AWS Command Line Interface (AWS CLI), or using infrastructure as code.

Using the Amazon ECS console, I create a new service and configure it as usual:

In the Deployment Options section, I choose ECS as the Deployment controller type, then Blue/green as the Deployment strategy. Bake time is the time after the production traffic has shifted to green, when instant rollback to blue is available. When the bake time expires, blue tasks are removed.

We’re also introducing deployment lifecycle hooks. These are event-driven mechanisms you can use to augment the deployment workflow. I can select which AWS Lambda function I’d like to use as a deployment lifecycle hook. The Lambda function can perform the required business logic, but it must return a hook status.

Amazon ECS supports the following lifecycle hooks during blue/green deployments. You can learn more about each stage on the Deployment lifecycle stages page.

Pre scale up
Post scale up
Production traffic shift
Test traffic shift
Post production traffic shift
Post test traffic shift

For my application, I want to test when the test traffic shift is complete and the green service handles all of the test traffic. Since there’s no end-user traffic, a rollback at this stage will have no impact on users. This makes Post test traffic shift suitable for my use case as I can test it first with my Lambda function.

Switching context for a moment, let’s focus on the Lambda function that I use to validate the deployment before allowing it to proceed. In my Lambda function as a deployment lifecycle hook, I can perform any business logic, such as synthetic testing, calling another API, or querying metrics.

Within the Lambda function, I must return a hookStatus. A hookStatus can be SUCCESSFUL, which will move the process to the next step. If the status is FAILED, it rolls back to the blue deployment. If it’s IN_PROGRESS, then Amazon ECS retries the Lambda function in 30 seconds.

In the following example, I set up my validation with a Lambda function that performs file upload as part of a test suite for my application.

import json
import urllib3
import logging
import base64
import os

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# Initialize HTTP client
http = urllib3.PoolManager()

def lambda_handler(event, context):
    """
    Validation hook that tests the green environment with file upload
    """
    logger.info(f"Event: {json.dumps(event)}")
    logger.info(f"Context: {context}")
    
    try:
        # In a real scenario, you would construct the test endpoint URL
        test_endpoint = os.getenv("APP_URL")
        
        # Create a test file for upload
        test_file_content = "This is a test file for deployment validation"
        test_file_data = test_file_content.encode('utf-8')
        
        # Prepare multipart form data for file upload
        fields = {
            'file': ('test.txt', test_file_data, 'text/plain'),
            'description': 'Deployment validation test file'
        }
        
        # Send POST request with file upload to /process endpoint
        response = http.request(
            'POST', 
            test_endpoint,
            fields=fields,
            timeout=30
        )
        
        logger.info(f"POST /process response status: {response.status}")
        
        # Check if response has OK status code (200-299 range)
        if 200 <= response.status < 300:
            logger.info("File upload test passed - received OK status code")
            return {
                "hookStatus": "SUCCEEDED"
            }
        else:
            logger.error(f"File upload test failed - status code: {response.status}")
            return {
                "hookStatus": "FAILED"
            }
            
    except Exception as error:
        logger.error(f"File upload test failed: {str(error)}")
        return {
            "hookStatus": "FAILED"
        }

When the deployment reaches the lifecycle stage that is associated with the hook, Amazon ECS automatically invokes my Lambda function with deployment context. My validation function can run comprehensive tests against the green revision—checking application health, running integration tests, or validating performance metrics. The function then signals back to ECS whether to proceed or abort the deployment.

As I chose the blue/green deployment strategy, I also need to configure the load balancers and/or Amazon ECS Service Connect. In the Load balancing section, I select my Application Load Balancer.

In the Listener section, I use an existing listener on port 80 and select two Target groups.

Happy with this configuration, I create the service and wait for ECS to provision my new service.

Testing blue/green deployments
Now, it’s time to test my blue/green deployments. For this test, Amazon ECS will trigger my Lambda function after the test traffic shift is completed. My Lambda function will return FAILED in this case as it performs file upload to my application, but my application doesn’t have this capability.

I update my service and check Force new deployment, knowing the blue/green deployment capability will roll back if it detects a failure. I select this option because I haven’t modified the task definition but still need to trigger a new deployment.

At this stage, I have both blue and green environments running, with the green revision handling all the test traffic. Meanwhile, based on Amazon CloudWatch Logs of my Lambda function, I also see that the deployment lifecycle hooks work as expected and emit the following payload:

[INFO]	2025-07-10T13:15:39.018Z	67d9b03e-12da-4fab-920d-9887d264308e	Event: 
{
    "executionDetails": {
        "testTrafficWeights": {},
        "productionTrafficWeights": {},
        "serviceArn": "arn:aws:ecs:us-west-2:123:service/EcsBlueGreenCluster/nginxBGservice",
        "targetServiceRevisionArn": "arn:aws:ecs:us-west-2:123:service-revision/EcsBlueGreenCluster/nginxBGservice/9386398427419951854"
    },
    "executionId": "a635edb5-a66b-4f44-bf3f-fcee4b3641a5",
    "lifecycleStage": "POST_TEST_TRAFFIC_SHIFT",
    "resourceArn": "arn:aws:ecs:us-west-2:123:service-deployment/EcsBlueGreenCluster/nginxBGservice/TFX5sH9q9XDboDTOv0rIt"
}

As expected, my AWS Lambda function returns FAILED as hookStatus because it failed to perform the test.

[ERROR]	2025-07-10T13:18:43.392Z	67d9b03e-12da-4fab-920d-9887d264308e	File upload test failed: HTTPConnectionPool(host='xyz.us-west-2.elb.amazonaws.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f8036273a80>, 'Connection to xyz.us-west-2.elb.amazonaws.com timed out. (connect timeout=30)'))

Because the validation wasn’t completed successfully, Amazon ECS tries to roll back to the blue version, which is the previous working deployment version. I can monitor this process through ECS events in the Events section, which provides detailed visibility into the deployment progress.

Amazon ECS successfully rolls back the deployment to the previous working version. The rollback happens near-instantaneously because the blue revision remains running and ready to receive production traffic. There is no end-user impact during this process, as production traffic never shifted to the new application version—ECS simply rolled back test traffic to the original stable version. This eliminates the typical deployment downtime associated with traditional rolling deployments.

I can also see the rollback status in the Last deployment section.

Throughout my testing, I observed that the blue/green deployment strategy provides consistent and predictable behavior. Furthermore, the deployment lifecycle hooks provide more flexibility to control the behavior of the deployment. Each service revision maintains immutable configuration including task definition, load balancer settings, and Service Connect configuration. This means that rollbacks restore exactly the same environment that was previously running.

Additional things to know
Here are a couple of things to note:

Pricing – The blue/green deployment capability is included with Amazon ECS at no additional charge. You pay only for the compute resources used during the deployment process.
Availability – This capability is available in all commercial AWS Regions.

Get started with blue/green deployments by updating your Amazon ECS service configuration in the Amazon ECS console.

Happy deploying!
— Donnie

Noise

Tag Archives: Featured

Introducing an Interactive Code Review Experience with Amazon Q Developer in GitHub

What’s new and why it matters

Getting Started with Amazon Q Developer in GitHub

Using Amazon Q Developer in Pull Requests

Conclusion

Introducing AWS Cloud Control API MCP Server: Natural Language Infrastructure Management on AWS

Key Features:

Why Use CCAPI MCP Server?

Creating and Managing Cloud Infrastructure

Prerequisites

Integration with Developer Tools

Configuration file structure

Important

Read Only Mode

Security Considerations

Example Use Case: Creating an S3 Bucket with KMS Encryption

Sample Prompts for Easy Start

Conclusion

Authors

Flexibility to Framework: Building MCP Servers with Controlled Tool Orchestration

The Challenge – Enforcing Tool Ordering in MCP

Understanding MCP Tool Discovery and Initialization

Solution – Token-Based Tool Orchestration: A New Pattern for AI Agents in MCP

Core Implementation:

Potential Limitations and Solutions

The Future of MCP

Conclusion

Authors

Introducing Amazon Elastic VMware Service for running VMware Cloud Foundation on AWS

Legal Hold Is Here: Protect Your Business When It Matters Most

What is Legal Hold?

How Backblaze Legal Hold helps you stay protected

Why Legal Hold matters in 2025

Ready when you need it most

Simplify serverless development with console to IDE and remote debugging for AWS Lambda

Accelerate safe software releases with new built-in blue/green deployments in Amazon ECS

The collective thoughts of the interwebz