Tag Archives: AI/ML

Your Training Data Is Your Most Valuable IP

Post Syndicated from Maddie Presland original https://www.backblaze.com/blog/your-training-data-is-your-most-valuable-ip/

A decorative image showing different generic computer module icons.

AI training data is now a company’s most valuable intellectual property—often worth more than the models themselves. Models can be replicated and architectures become public knowledge, but the datasets that capture your domain expertise and years of careful curation are irreplaceable.

Yet as AI workflows become increasingly distributed, that data moves constantly between environments, increasing exposure while reducing visibility. According to IBM, “Forty percent of breaches involved data stored across multiple environments… highlighting the challenge of tracking and safeguarding data, including shadow data, and data in AI workloads.” Meanwhile MIT Sloan researchers have documented that AI training datasets are often inconsistently documented and poorly understood, creating exposure that extends beyond technical vulnerabilities into operational and compliance failures.

Yet many organizations still treat training datasets as just another storage bucket. But protecting data at rest is both a compliance requirement and a competitive necessity. The integrity of your datasets now determines the integrity of your models.

Free resource: Understand why object storage is a strategic driver

Download our free ebook to learn how object storage supports every stage of the AI pipeline—from data collection to model deployment.

Download the Ebook

Why training data is the new target

The attack surface for AI systems has fundamentally shifted. Rather than targeting models in production, sophisticated adversaries now focus on the training pipeline itself.

Data poisoning has emerged as an insidious threat

Attackers inject subtle changes like biased samples, mislabeled data, or adversarial examples that skew model outcomes or introduce hidden backdoors. Recent research reveals that 26% of organizations surveyed in the US and UK have been victims of AI data poisoning in the last year. These poisoned models can quietly undermine fraud detection, weaken cyber defenses, and corrupt business-critical decisions.

Intellectual property theft takes on new dimensions

When adversaries steal training datasets, they’re stealing the accumulated expertise that gives your models their edge. Your training data represents thousands of hours of curation and annotation that encodes institutional knowledge about your customers and market. A competitor with your datasets can replicate your capabilities in weeks rather than years.

Silent corruption poses an equally serious but less visible threat

Infrastructure failures, human errors, or gradual drift in data pipelines can corrupt training datasets without triggering alerts. For organizations in regulated industries such as healthcare, financial services, or autonomous systems, this creates a reproducibility crisis. How do you prove your model was trained on authentic, unaltered data when you can’t verify the data’s provenance?

The NIST AI Risk Management Framework emphasizes that maintaining the provenance of training data and supporting attribution of AI system decisions to subsets of training data can assist with both transparency and accountability. Regulators and customers increasingly expect verifiable proof of data integrity throughout the training lifecycle.

The takeaway? The trustworthiness of every model begins with the trustworthiness of its data.

The principles of a secure AI data foundation

A strong protection model rests on three pillars—immutability, encryption, and regional control—each reinforcing long-term integrity.

1. Immutability: Protect against tampering or deletion

Immutability means write-once, read-many (WORM) protection that prevents modification or removal. Once data is written, it becomes locked—no one can modify, overwrite, or delete it for a defined retention period, but it remains fully accessible for reading. This technical guarantee prevents data poisoning attacks, stops accidental deletion, and enables verifiable reproducibility.

CISA advisories recommend immutable backups to guard against ransomware, but the benefits extend much further for AI systems. When you lock a dataset snapshot before training begins, you guarantee the ability to reproduce that exact model state, which is critical for debugging, regulatory audits, and forensic investigations when models fail.

Object Lock capabilities enforce immutability at the storage layer for set retention periods. Each dataset version becomes permanently immutable, creating an unalterable record of your training history that no administrator or attacker can modify.

Implementation tip: Enable Object Lock at the bucket level and integrate it with your data-ingestion scripts to automatically lock datasets as they’re created.

2. Encryption: Safeguard confidential data

Training datasets contain extraordinary value—customer information, proprietary annotations, competitive intelligence embedded in data selection. Server-side encryption protects this data both in transit and at rest, defending against unauthorized access even if other security layers fail. The EU’s recent NIS2 technical guidance explicitly prescribes cryptography as a required control measure for compliance.

The key to practical encryption is simplicity. Solutions should integrate seamlessly into existing workflows without requiring separate key-management infrastructure or introducing performance overhead that disrupts training pipelines.

Implementation tip: Look for server-side encryption options (like SSE-B2 or SSE-C) that remain transparent to your applications while providing the protection regulators require.

3. Regional control: Ensure data sovereignty and availability

Where your data physically resides matters for compliance, latency, and operational resilience. GDPR and similar regulations often require that sensitive data remain within specific jurisdictions. Beyond compliance, regional placement affects training performance—positioning data near compute resources or using high-performance delivery mechanisms can reduce transfer delays when moving large datasets.

The critical factor is transparency. You need explicit control over region selection and assurance that data won’t be replicated to secondary regions without your knowledge. Ambiguous “regional” configurations that might span continents create compliance risk. 

Consider a U.S. biomedical AI startup working with patient-derived data. They need datasets stored exclusively in U.S. regions to satisfy HIPAA requirements, Object Lock enabled to prove data integrity for regulatory submissions, and encryption applied to protect sensitive patient information—all while maintaining the competitive advantage their proprietary data provides. Regional control with clear guarantees makes this achievable.

Implementation tip: Choose storage providers that let you explicitly select regions during bucket creation with clear guarantees about where data resides, including replication destinations.

Beyond security: Enabling trust and traceability

Immutable, encrypted, regionally contained object storage enables AI governance at a level traditional storage infrastructure cannot.

Each dataset snapshot becomes a verifiable record of model history. When a model behaves unexpectedly in production, you can trace back to the exact training data used to create it. This capability accelerates debugging and provides the evidence needed to explain model decisions to regulators, customers, or internal stakeholders.

Storage infrastructure with built-in immutability and access logging provides the verifiable evidence that auditors require. Instead of reconstructing data lineage from logs and documentation, you can demonstrate exactly what happened with cryptographic proof.

These capabilities transform storage from a passive repository into an active component of your AI governance framework.

Implementation snapshot: Putting it all together

Establishing these protections with Backblaze B2 follows a straightforward path:

  1. Create buckets in regions that match your compliance and latency requirements.
  2. Enable Object Lock and configure retention policies aligned with your model development lifecycle.
  3. Apply server-side encryption (SSE-B2 or SSE-C) to all training data buckets.
  4. Activate versioning to maintain a complete history of dataset evolution.
  5. Configure logging to track access patterns and enable lineage verification.
  6. Integrate with compute using standard S3 compatible tools.

For organizations running intensive training workloads, Backblaze B2 Overdrive provides high-throughput object storage with up to 1Tbps throughput speeds and unlimited free egress. This allows enterprises to perform large quantities of concurrent data operations without performance degradation, keeping compute resources—including expensive GPUs—from sitting idle while waiting for data transfers. B2 Overdrive maintains the same security and compliance capabilities as standard Backblaze B2 while enabling faster iteration on model development.

The bottom line: Trust begins with proven data

The datasets you’ve built represent years of institutional knowledge—far more difficult to replace than the models trained on them. Protecting that intellectual property requires more than access controls and perimeter security. You need to prove the integrity of your data to regulators who demand accountability, to customers who expect trustworthy AI, and to your own teams who need confidence in model reproducibility.

Immutability and encryption make that proof simple and reliable. With Backblaze B2, you gain a clear, verifiable foundation for protecting your training data with the same rigor you apply to your most critical assets. Learn more about where Backblaze B2 sits in the AI data pipeline, or talk to our cloud storage experts.

The post Your Training Data Is Your Most Valuable IP appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Introducing the AWS Infrastructure as Code MCP Server: AI-Powered CDK and CloudFormation Assistance

Post Syndicated from Idriss Laouali Abdou original https://aws.amazon.com/blogs/devops/introducing-the-aws-infrastructure-as-code-mcp-server-ai-powered-cdk-and-cloudformation-assistance/

Streamline your AWS infrastructure development with AI-powered documentation search, validation, and troubleshooting

Introduction

Today, we’re excited to introduce the AWS Infrastructure-as-Code (IaC) MCP Server, a new tool that bridges the gap between AI assistants and your AWS infrastructure development workflow. Built on the Model Context Protocol (MCP), this server enables AI assistants like Kiro CLI, Claude or Cursor to help you search AWS CloudFormation and Cloud Development Kit (CDK) documentation, validate templates, troubleshoot deployments, and follow best practices – all while maintaining the security of local execution.

Whether you’re writing AWS CloudFormation templates or AWS Cloud Development Kit (CDK) code, the IaC MCP Server acts as an intelligent companion that understands your infrastructure needs and provides contextual assistance throughout your development lifecycle.

The Model Context Protocol (MCP) is an open standard that enables AI assistants to securely connect to external data sources and tools. Think of it as a universal adapter that lets AI models interact with your development tools while keeping sensitive operations local and under your control.

The IaC MCP Server provides nine specialized tools organized into two categories:

Remote Documentation Search Tools

These tools connect to the AWS Knowledge MCP backend to retrieve relevant, up-to-date information:

  1.  search_cdk_documentation
    Search the AWS CDK knowledge base for APIs, concepts, and implementation guidance.
  2. search_cdk_samples_and_constructs
    Discover pre-built AWS CDK constructs and patterns from the AWS Construct Library.
  3. search_cloudformation_documentation
    Query CloudFormation documentation for resource types, properties, and intrinsic functions.
  4. read_cdk_documentation_page
    Retrieve and read full documentation pages returned from searches or provided URLs.

Local Validation and Troubleshooting Tools

These tools run entirely on your machine

  1. cdk_best_practices
    Access a curated collection of AWS CDK best practices and design principles.
  2. validate_cloudformation_template
    Perform syntax and schema validation using cfn-lint to catch errors before deployment.
  3. check_cloudformation_template_compliance
    Run security and compliance checks against your templates using AWS Guard rules and cfn-guard.
  4. troubleshoot_cloudformation_deployment
    Analyze CloudFormation stack deployment failures with integrated CloudTrail event analysis. This tool will use your AWS credentials to analyze your stack status.
  5. get_cloudformation_pre_deploy_validation_instructions
    Returns instructions for CloudFormation’s pre-deployment validation feature, which validates templates during change set creation.

Key Use Cases

  1. Intelligent Documentation Assistant

Instead of manually searching through documentation, ask your AI assistant natural language questions:

“How do I create an S3 bucket with encryption enabled in CDK?”

The server searches CDK best practic and samples, returning relevant code examples and explanations.

     2. Proactive Template Validation

Before deploying infrastructure changes:

User: “Validate my CloudFormation template and check for security issues”

AI Agent: [Uses validate_cloudformation_template and check_cloudformation_template_compliance]

“Found 2 issues: Missing encryption on EBS volumes,

and S3 bucket lacks public access block configuration”

 3. Rapid Deployment Troubleshooting

When a stack deployment fails:

User: “My stack ‘stack_03’ in us-east-1 failed to deploy. What happened?”

AI Agent: [Uses troubleshoot_stack_deployment with CloudTrail integration]

“The deployment failed due to insufficient IAM permissions.

CloudTrail shows AccessDenied for ec2:CreateVpc.

You need to add VPC permissions to your deployment role.”

     4. Learning and Exploration

New to AWS CDK? The server helps you discover constructs and patterns:

User: “Show me how to build a serverless API”

AI Agent: [Searches CDK constructs and samples]

“Here are three approaches using API Gateway + Lambda…”

Architecture and Security

Security Design

Local Execution: The MCP server runs entirely on your local machine using uv (the fast Python package manager). No code or templates are sent to external services except for documentation searches.

AWS Credentials: The server uses your existing AWS credentials (from ~/.aws/credentials, environment variables, or IAM roles) to access CloudFormation and CloudTrail APIs. This follows the same security model as the AWS CLI.

stdio Communication: The server communicates with AI assistants over standard input/output (stdio), with no network ports opened.

Minimal Permissions: For full functionality, the server requires read-only access to CloudFormation stacks and CloudTrail events—no write permissions needed for validation and troubleshooting workflows.

Getting Started

Prerequisites

  • Python 3.10 or later
    uv package manager
    AWS credentials configured locally
    MCP-compatible AI client (e.g., Kiro CLI, Claude Desktop)

Configuration

Configure the MCP server in your MCP client configuration. For this blog we will focus on Kiro CLI. Edit .kiro/settings/mcp.json):

{
  "mcpServers": {
    "awslabs.aws-iac-mcp-server": {
      "command": "uvx",
      "args": ["awslabs.aws-iac-mcp-server@latest"],
      "env": {
        "AWS_PROFILE": "your-named-profile",
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "disabled": false,
      "autoApprove": []
    }
  }
}

Security Considerations

Privacy Notice: This MCP server executes AWS API calls using your credentials and shares the response data with your third-party AI model provider (e.g., Amazon Q, Claude Desktop, Cursor, VS Code). Users are responsible for understanding your AI provider’s data handling practices and ensuring compliance with your organization’s security and privacy requirements when using this tool with AWS resources.

IAM Permissions

The MCP server requires the following AWS permissions:

For Template Validation and Compliance:

  • No AWS permissions required (local validation only)

For Deployment Troubleshooting:

  • cloudformation:DescribeStacks
  • cloudformation:DescribeStackEvents
  • cloudformation:DescribeStackResources
  • cloudtrail:LookupEvents (for CloudTrail deep links)

Example IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudformation:DescribeStacks",
        "cloudformation:DescribeStackEvents",
        "cloudformation:DescribeStackResources",
        "cloudtrail:LookupEvents"
      ],
      "Resource": "*"
    }
  ]
}

Example Use Case With Kiro CLI

IMPORTANT: Ensure you have satisfied all prerequisites before attempting these commands.

1. With the mcp.json file correctly set, try to run a sample prompt. In your terminal, run kiro-cli chat to start using Kiro-cli in the CLI.

Figure 1: Kiro-CLI with AWS IaC MCP server

Figure 1: Kiro-CLI with AWS IaC MCP server

Scenarios:

  • “What are the CDK best practices for Lambda functions?”

Figure 2 Search the CDK best practices for Lambda functions

Figure 2: Search the CDK best practices for Lambda functions

  • “Search for CDK samples that use DynamoDB with Lambda”

Figure 3: Search for CDK samples that use DynamoDB with Lambda

Figure 3: Search for CDK samples that use DynamoDB with Lambda

  • “Validate my CloudFormation template at ./template.yaml”

Figure 4: Validate my CloudFormation template with AWS IaC MCP Server

Figure 4: Validate my CloudFormation template with AWS IaC MCP Server

  • “Check if my template complies with security best practices”

Figure 5: Check if my template complies with security best practices with AWS IaC MCP Server

Figure 5: Check if my template complies with security best practices with AWS IaC MCP Server

Best Practices

  • Start with Documentation Search: Before writing code, search for existing constructs and patterns
  • Validate Early and Often: Run validation tools before attempting deployment
  • Check Compliance: Use check_template_compliance to catch security issues during development
  • Leverage CloudTrail: When troubleshooting, the CloudTrail integration provides detailed failure context
  • Follow CDK Best Practices: Use the cdk_best_practices tool to align with AWS recommendations

What’s Next?

The IAC MCP Server represents a new paradigm in the AI agentic workflow infrastructure development – one where AI assistants understand your tools, help you navigate complex documentation, and provide intelligent assistance throughout the development lifecycle.

Get Involved

The AWS IaC MCP Server is available now:

  • Documentation and GitHub Repository: aws-iac-mcp-server
  • Feedback: We welcome issues and pull requests! Or respond to our IaC survey here.

Ready to supercharge your infrastructure as code development? Install the IaC MCP Server today and experience AI-powered assistance for your AWS CDK and CloudFormation workflows.

Have questions or feedback? Reach out to the blog authors on the AWS Developer Forums.

About Authors

Idriss Laouali Abdou

Idriss is a Sr. Product Manager Technical on the AWS Infrastructure-as-Code team based in Seattle. He focuses on improving developer productivity through AWS CloudFormation and StackSets Infrastructure provisioning experiences. Outside of work, you can find him creating educational content for thousands of students, cooking, or dancing.

Brian Terry

Brian Terry, Senior WW Data & AI PSA, is an innovation leader with more than 20 years of experience in technology and engineering. Brian is pursuing a PhD in computer science at the University of North Dakota and has spearheaded generative AI projects, optimized infrastructure scalability, and driven partner integration strategies. He is passionate about leveraging technology to deliver scalable, resilient solutions that foster business growth and innovation.

Building Multimodal AI Data Infrastructure with Pixeltable

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/building-multimodal-ai-data-infrastructure-with-pixeltable/

A decorative image showing a chip with the word 'AI' and digital lines extending into the background.

We’re approaching a fascinating inflection point in AI development. Research from Epoch AI indicates that high-quality text data will be fully exhausted by 2026 to 2028. As recently as January, OpenAI co-founder Ilya Sutskever said at a conference that all the useful data online had already been used to train models. Over 35% of top websites now block AI scrapers. OpenAI is cutting deals with publishers like The Financial Times because freely available training data is running out.

So what comes next? Multimodal data: video, images, audio, sensor readings. Data that captures how the physical world actually operates, not just how we describe it in text.

Nvidia CEO Jensen Huang highlighted this shift when discussing Tesla’s AI advantage. He noted that the company has a “phenomenal position” because Tesla is collecting massive amounts of real-world data through its AI-enabled factories and autonomous vehicles.”

This real-world data, what some call “world data,” is multimodal at its core. It includes video from cameras capturing spatial relationships and motion, sensor telemetry recording physical interactions, images showing object states, and audio capturing environmental context. Video is particularly valuable because it captures temporal dynamics, depth perception, and how objects interact over time, insights that static text or images alone cannot provide.

Here’s the insight most organizations miss: you’re already generating this data.

Your organization is already producing multimodal data

Every single day, your organization produces massive amounts of multimodal data, including:

  • Zoom calls with video, audio, and screen shares
  • Security camera footage
  • Customer service interactions combining chat logs, voice recordings, website screen recordings and product images
  • Manufacturing sensors producing telemetry alongside quality inspection photos
  • Marketing teams creating videos, graphics, and campaign documents
  • Sales demos mixing presentations, product screenshots, and recorded conversations

And that’s just the short list.

The problem isn’t scarcity. It’s how multimodal data gets siloed, deleted, or stored in ways that make it unusable for AI applications. Video sits in one system and transcripts in another, with metadata scattered across databases. Most organizations treat this as operational exhaust rather than the strategic asset it represents.

Organizations that start systematically leveraging their multimodal data today will have capabilities tomorrow that generic models can never match.

The challenge: Multimodal infrastructure complexity

Building AI systems that work across images, video, audio, and text traditionally requires stitching together a fragmented technology stack. Videos live in object storage. Structured data sits in relational databases. Vector embeddings need specialized vector databases. Custom ETL pipelines handle transformations. Orchestration code coordinates everything. You need separate systems for caching, versioning, and lineage tracking.

This “data plumbing” consumes more engineering time than actual AI development. A straightforward workflow like building a searchable video archive with object detection and similarity search requires coordinating five or more systems and writing hundreds of lines of orchestration code.

The complexity creates a barrier that prevents most organizations from leveraging their multimodal data effectively, even when the underlying AI models are accessible through APIs. That’s the gap that Pixeltable solves.

How Pixeltable simplifies multimodal data workloads

Pixeltable replaces the fragmented multi-system architecture typically required for AI applications with a single declarative table interface. Instead of coordinating databases, file storage, vector databases, APIs, and orchestration tools separately, you work with tables where multimodal data lives alongside your transformations and AI operations.

The approach is straightforward. Store multimodal data in tables, define transformations as computed columns, and query everything together. Pixeltable handles the orchestration, caching, and model execution automatically.

Connect to data in-place 

Point Pixeltable at your existing object stores like AWS S3 or Backblaze B2 Cloud Storage without moving or duplicating data. Your files stay where they are, organized into queryable, versioned tables. No separate databases or vector stores needed.

Define workflows declaratively 

Transformations, model inference, and custom logic become Python computed columns. Extract frames from video, run object detection, generate embeddings, define it once and Pixeltable auto-orchestrates execution, manages dependencies, and handles incremental updates when new data arrives.

Query across everything

Leverage semantic search co-located with metadata. Raw data and AI-generated results in one interface. Build RAG systems with auto-synced embedding indexes that eliminate separate vector database management.

Focus on logic, not infrastructure

Full versioning for reproducibility. Automatic incremental processing means only necessary computations run when data changes. The same code works in development and production without rewrites.

For a practical example, explore our companion Github notebook Multimodal Data Processing with Pixeltable and Backblaze B2. It demonstrates how to extract and transform video frames using Pixeltable, then store the processed results in Backblaze B2 Cloud Storage with automatic URL generation.

Powering multimodal AI with Pixeltable and Backblaze B2

At Backblaze, we understand how essential multimodal data has become for AI development. Our collaboration with Pixeltable integrates B2 Cloud Storage directly into their open-source framework, giving organizations a simple and scalable foundation for managing complex AI workloads.

Pixeltable’s declarative design works seamlessly with Backblaze B2 across the entire AI data lifecycle. Whether you are processing video for model training, running inference on image streams, or building retrieval-augmented generation systems with multimodal embeddings, Backblaze B2 provides reliable S3 compatible storage that Pixeltable can reference directly without data duplication.

We are working closely with the Pixeltable team on a handful of initiatives to make multimodal workflows easier to deploy and scale. For those exploring this integration, we provide an example that demonstrates how Pixeltable and Backblaze B2 work together across the multimodal AI pipeline.

The data that fuels multimodal AI already exists across most organizations, from meeting recordings to customer interactions, video archives, and sensor logs. With Pixeltable and Backblaze B2, the infrastructure to harness that data effectively is now within reach.

Explore Pixeltable on GitHub or visit pixeltable.com to learn about declarative multimodal data infrastructure. For S3 compatible storage across your AI pipeline, check out Backblaze B2.

The post Building Multimodal AI Data Infrastructure with Pixeltable appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Amazon introduces two benchmark datasets for evaluating AI agents’ ability on code migration

Post Syndicated from Linbo Liu original https://aws.amazon.com/blogs/devops/amazon-introduces-two-benchmark-datasets-for-evaluating-ai-agents-ability-on-code-migration/

Introduction: Repository-Level Code Migration

Code migration is a repository-level transformation process that modernizes entire software projects to run on new platforms, frameworks, or runtime environments while preserving their original functionality and structure. Rather than focusing on isolated files or APIs, it operates across the full repository, spanning source code, dependencies, build systems, and configuration files to ensure consistency and correctness at scale. Typical examples include upgrading Java repositories from legacy versions such as Java 8 to modern Long-Term Support releases like Java 17 or 21, migrating .NET Framework repositories to .NET Core, and upgrading AWS Lambda projects in Python or Node.js to the latest runtime versions.

Code migration is a challenging software engineering (SWE) task that involves runtime upgrade, deprecated API replacement, test framework optimization, and syntax modernization. As we build agentic solutions for code migration, the community needs a standardized benchmark dataset and an evaluation framework to measure how well these systems actually perform. To close this gap, we introduce two benchmark datasets: MigrationBench on Java and Poly-MigrationBench as an extension to other programming languages. These datasets are designed not only to benchmark the effectiveness of Large Language Models (LLMs) in repository-level migration, but also to provide the community with a standardized evaluation framework for reproducible experiments.

Solution Overview

MigrationBench: Repository-Level Java Migration

MigrationBench is a comprehensive repository-level benchmark focused on Java upgrades. Specifically, it evaluates the ability of LLMs and other tools to migrate code from Java 8 to newer Long-Term Support (LTS) versions such as Java 17 and Java 21.

The full dataset includes 5,102 open-source Java 8 Maven repositories collected from GitHub, alongside a representative subset of 300 repositories curated for research requiring fewer compute resources. MigrationBench also provides an evaluation framework for validating Java Maven repository upgrades.

Our data collection process follows a carefully designed pipeline with multiple filtering stages to ensure the quality and relevance of the repositories we include. We begin by collecting Java Maven projects, focusing on repositories written in Java that use Maven as their build tool. Next, we apply a license filter, retaining only repositories under MIT or Apache 2.0 licenses to ensure open and permissible usage. We then apply a quality filter, keeping only repositories with at least three GitHub stars to exclude toy or inactive projects. For each repository, we search for the latest buildable commit that is compatible with Java 8, ensuring a valid starting point for migration. We also remove redundant repositories based on their snapshot hashes. Finally, we further exclude repositories without any unit tests or integration tests, which are essential components to validate migration correctness in a robust way. For more details, checkout our paper MigrationBench: Repository-Level Code Migration Benchmark from Java 8 and the GitHub repository.

Poly-MigrationBench: Extending Beyond Java

While MigrationBench focuses exclusively on Java, the real-world code migration problem spans multiple ecosystems. To address this broader scope, we develop Poly-MigrationBench, an extension that introduces additional languages and platforms. We applied a similar data curation process as MigrationBench to additionally collect

  • 100 .NET Framework repositories. They are to be migrated to .NET core.
  • 74 Node.js repositories with version less than Node.js 22. They are to be migrated to Node.js 22.
  • 83 Python repositories with Python version less than 3.13. They are to be migrated to Python 3.13.

The above datasets are publicly available on GitHub: https://github.com/amazon-science/Poly-MigrationBench

Together, these datasets enable researchers to explore cross-language and cross-platform migration challenges at scale.

Use Case 1: Cross-Platform .NET Migration

One pressing migration challenge lies in moving .NET applications from Windows environments running on the legacy .NET Framework to Linux environments powered by .NET Core. This migration is critical for organizations seeking cross-platform compatibility, improved performance, and modern deployment practices such as containerization.

To support research in this area, we curated a benchmark of 100 open-source .NET Framework repositories from GitHub. These projects were carefully selected for diversity and quality, offering a real world foundation for evaluating migration tools and automated systems. The migration goal is clear: transition .NET Framework repositories to .NET Core on Linux while preserving functional equivalence.

Use Case 2: Node.js Upgrade for AWS Lambda Applications

Another timely migration need involves Lambda functions written in Node.js. Node.js 20, currently supported by Lambda, is scheduled for end-of-support in April 2026 (reference). After this deadline, projects running on Node.js 20 will no longer receive critical security patches or bug fixes.

For increased security and to avoid accumulating technical debt, developers building Lambda applications are proactively upgrading to Node.js 22. To evaluate LLMs’ effectiveness in automating this migration, Poly-MigrationBench provides a dataset of 74 open-source Node.js repositories using Node.js versions no later than 20. The task is to upgrade them to Node.js 22 while ensuring functional correctness is preserved.

Use-case 3: AWS Lambda Python Migrations

We also release benchmarks on Lambda Python repositories to the community to facilitate research and evaluation of automated Lambda function migrations in Python code. According to AWS documentation, Python 3.10 and 3.11 are scheduled to reach end of support for Lambda in June 2026. This approaching deadline highlights the urgency of migrating existing Lambda functions to newer runtimes and underscores the critical need for scalable, reliable, and LLM-driven migration solutions. To facilitate evaluation on this task, we collect 83 Python AWS Lambda repositories with Python version no later than 3.12. The objective is to migrate these repositories to Python 3.13.

Get Started

We’ve open-sourced both the datasets and the evaluation framework on Hugging Face and GitHub to make it easy for the community to explore, reproduce, and extend our work. Alongside them, we also released a baseline solution, SD-Feedback, for MigrationBench, while leaving the development of more sophisticated agentic migration systems as a open challenge for the research community.

MigrationBench

To download the MigrationBench dataset, visit our Hugging Face collection. For evaluation, simply clone our GitHub repository and follow the steps in the README.md.

Poly-MigrationBench

To access the Poly-MigrationBench dataset and evaluation commands, clone our GitHub repository.

For a deeper dive into how the benchmarks were curated and how the evaluation framework was designed, check out our paper:

MigrationBench: Repository-Level Code Migration Benchmark from Java 8

Conclusion

Code migration is an essential but complex task for maintaining long-term software reliability and security. With MigrationBench and Poly-MigrationBench, we aim to provide the community with systematic, large-scale benchmarks that enable reproducible research and practical evaluation of automated migration approaches.

Authors

Linbo Liu

Linbo Liu is an Applied Scientist at Amazon Web Services. He works on coding agents optimization and post-training.

Yiyi Guo

Yiyi Guo is a Senior Product Manager at Amazon Web Services. She works on agentic AI, software migration and modernization in AWS Transform.

Luke Huan

Luke Huan is a Senior Principal Scientist at Amazon Web Services. He works on agentic AI, generative AI, AI4code and supports AWS Transform.

Making the Backblaze Network AI Ready

Post Syndicated from Brent Nowak original https://www.backblaze.com/blog/making-the-backblaze-network-ai-ready/

An illustration of a chip with AI written on it.

AI isn’t just reshaping how data is processed—it’s rewriting how data moves. Behind every training run or inference pipeline is a torrent of data, and how efficiently (or not) that data travels through networks (and whether it’s an AI-ready network) can make or break performance. 

Data workloads have massively evolved over the 18 years we’ve been in business from computer backups to exabyte-scale storage to AI data pipelines. And that has implications for not just our storage hardware, but our network. 

What started as a single ISP serving a few racks in the early days has grown into a global, multi-terabit backbone connecting customers, compute, and storage in real time via multiple Tier 1 carriers, Internet Exchanges, and PNI links. 

So why talk about it now? Because AI is testing the limits of every part of the infrastructure stack—and the network is where those limits are most visible. Running an AI-ready network means rethinking how you design, route, and scale traffic to handle not just more data, but faster, more synchronized, and more resilient data movement than ever before.

In this post, I’m talking about how our network has evolved to support AI workflows, including what’s changed under the hood, how we’re adapting our hardware and architecture, and what that means for the way data moves through Backblaze today.

Go with the flow

The Network Engineering (NetEng) group at Backblaze is responsible for the design, implementation, and support of our physical network—everything from the physical copper and fiber cables inside our datacenters to the routers and switches that connect our storage to the world.

When we talk about network traffic, we often refer to a “flow”—a stream of information sent between two or more parties. Downloading a file? That’s a flow between your computer and the server offering the file. Multiple small requests loading a website (text, formatting code, animation code, etc.)? Those are known as “mouse” flows. Massive dataset transfers that sustain hundreds of gigabits per second? Those are “elephant” flows. 

The elephant in the room

AI workloads are the largest “elephant” flows our network has ever sustained. These aren’t just big files, they’re ecosystems of data: multi-petabyte datasets, hundreds of thousands of objects ranging from a single megabyte to hundreds of megabytes per object, and thousands of simultaneous connections working in parallel.

Moving these data sets around is no small task. It means engineering for sustained, lossless throughput. It’s cutting edge, using many machines to perform parallel operations, all at large transfer rates. Let’s say we’re the source of a dataset that is being transferred to a neocloud for processing, the processing layers (often GPUs) want a continuous stream of high bandwidth with no loss. And a single dropped packet in a training pipeline can trigger expensive re-requests, idle GPUs, and cascading slowdowns. 

With that in mind, we’ve evolved our infrastructure from traditional cloud networking—designed for smaller flows—to handle the relentless firehose of AI data.

Traditional cloud vs AI cloud

AI changes everything about traffic behavior. It doesn’t just mean that our total capacity is bigger, but also that our considerations for how we design, support, and scale our infrastructure morphed along with our capacity upgrades.

Here’s a quick overview of the former challenges and the new ones we’re engineering to serve our AI workflows.

Traditional Cloud Network AI Cloud Network
Small to large flow sizes (megabits to, gigabits) Very large flows (multi-gigabit to terabit)
High entropy flows (many sources and destinations) Low entropy flows (consistent source/destination pairs)
Predictable usage patterns Burst traffic patterns
Tolerant to failures Sensitive to faults, buffering, congestion

In short: AI traffic is heavier, stickier, and far less forgiving. So the goal is to design networks that can transfer 100Gbps, 200Gbps, and up to 1,000 Gbps (1 Terabit) a second with a low latency, low jitter, and a zero loss profile. Simple right? 

Hardware network upgrades

To meet these new demands of AI workflows, we’ve upgraded nearly every layer of our physical infrastructure. We needed to increase the density of our networking hardware, deploy denser fiber optic solutions, and upgrade the capacity of our edge network.

What technologies are we deploying?

1. Transitioning from NRZ to PAM4 Optics

The fiber optic modules that are used to connect all our infrastructure hardware (servers, switches, routers) have been transitioned to modules that support a denser encoding method. Both NRZ and PAM4 are technologies used to modulate signals. Think of NRZ as a one-lane highway with one passenger per car. PAM4 adds three more passengers per car, doubling the rate without doubling lanes and with controllable cons such as increased noise sensitivity. By using four voltage levels instead of two, PAM4 transmits twice the information per signal change, effectively doubling bandwidth per fiber strand.

2. MTP-8 and MTP-16 Fiber

MTP is a fiber connector type and the number after denotes the number of fiber optic strands contained within the cable. The higher the number, the more fiber pairs in the cable. We’ve used MTP-8 for years (four pairs of fiber), but to handle AI-scale traffic, we’re now deploying MTP-16 for higher-density connections. That means where we once ran 100G links, we now run 400G—and can scale up to multiple 100G paths as workloads grow (4x100G, 8x100G, etc).

3. Expanding edge and core capacity

We’ve refreshed routers and switches to handle higher port speeds and density—moving from 100G to 400G interfaces across our interconnects. The result: higher aggregate throughput and better fault isolation for massive parallel transfers.

Visualizing an AI workflow

Our monitoring tools track network flows (TCP conversations) in real time, giving us visibility into how large AI workflows move across the infrastructure. We use this type of information to monitor and make sure that large workflows are distributed across our physical infrastructure to allow for traffic balancing.

So, what does a large “AI workflow” look like? It’s not one device talking to one device at a high rate, but rather a collection of actors all working together.

On our side, our API layer speaks to our storage layer, requesting the files. Once the files are retrieved from our storage layer, they flow through our API servers and are then sent to a destination. In order to achieve a high throughput, many API servers talk to many destination servers. 

A typical 200+ Gbps transfer (diagrammed below) might involve four API virtual IPs (VIPs), each hosted on multiple backend servers sending 5–7 Gbps to ten destination nodes for a total output of 52Gbps from each API server. On the receiving side, each destination server might be ingesting 20Gbps across multiple streams.

The key insight: AI data transfer isn’t one big pipe—it’s a distributed mesh of many coordinated streams. Our design scales linearly—add more API servers, add more destination nodes, and the flow grows predictably without congestion or packet loss.

Conclusion 

AI workflows have redefined what “fast” means on the network. At Backblaze, we’ve evolved from a single-ISP startup to an AI-scale infrastructure provider by continuously pushing the boundaries of connectivity, throughput, and reliability.

As our customers push the frontiers of AI, we’ll keep tuning the invisible layer that makes it possible: the AI-ready network.

The post Making the Backblaze Network AI Ready appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Why CoreWeave’s Object Storage Launch is Good for AI—and Everyone Building It

Post Syndicated from Maddie Presland original https://www.backblaze.com/blog/why-coreweaves-object-storage-launch-is-good-for-ai-and-everyone-building-it/

A decorative image showing cloud storage and AI icons.

CoreWeave just launched their own AI Object Storage. Our take? We love to see it. 

At first glance, it might look like a competitive offering, but as far as we’re concerned, the more storage options out there, the better for builders. It’s another sign that object storage has officially arrived as a key ingredient in the AI stack. 

Now, your AI stack can look like this: Fast, flexible storage close to your GPUs from CoreWeave (essential for training and inference). And when the run’s over? Move it to Backblaze B2 Overdrive to keep it ready for your next run at the right temperature and price-to-performance ratio.

More options mean more ways to build smart, cost-efficient pipelines that let teams train faster and iterate more without getting locked in. We’ll always cheer for that. 

Why object storage is essential for AI workloads

How do you balance scalability with performance while staying on budget? This ebook explores how object storage enhances every stage of the pipeline from collection to training to deployment, and provides real-world use cases.

Get the Ebook

Why object storage matters in the AI stack

Every AI model depends on moving massive datasets through training, inference, and retraining cycles. Each stage requires fast, reliable access to data. That’s where object storage comes in.

Object storage enables this by offering:

  • Elastic scalability for petabyte-scale data.
  • Reliability and durability across long model lifecycles.
  • Lifecycle management features to balance cost, performance, and accessibility.

As AI projects scale, smart data management becomes just as important as GPU performance. High-end GPUs can only deliver full value when they’re continuously fed the right data at the right time. When data sits in the wrong tier or takes too long to retrieve, compute resources go underused. And that means wasted time and money.

Balancing performance and cost in AI workloads

CoreWeave’s Local Object Transport Accelerator (LOTA) delivers up to 7GB/s throughput per GPU, helping data move quickly between storage and compute. With pricing around $110 per terabyte (about $60 with discounts) and regional capacity up to 10TiB, it’s built for performance-critical workloads where proximity to GPUs makes a measurable difference.

Its launch adds more choice to the ecosystem and highlights the growing demand for storage built specifically for AI. As more specialized options emerge, organizations are thinking carefully about how to right-size their infrastructure for each stage of the AI lifecycle.

When maximum performance is the goal, GPU-adjacent storage like CoreWeave’s can help teams squeeze out every last bit of speed during intensive training cycles. But for most AI workloads, B2 Overdrive provides the right balance of cost and performance. It offers the throughput and durability needed to support active training while keeping pricing predictable and scalable.

Many AI builders combine these strengths through a multi-cloud setup. Teams might use CoreWeave Object Storage when latency and proximity to GPUs deliver measurable gains, and then keep the rest of their AI pipeline on B2 Overdrive so datasets remain readily available for retraining, testing, or deployment.

Example configuration:

  • CoreWeave Object Storage for specialized, compute-intensive training where every millisecond counts. It’s ideal for short bursts of high-throughput processing, such as large-scale model fine-tuning or time-sensitive inferencing.
  • B2 Overdrive for the broader AI workflow, including day-to-day training, staging, versioning, and long-term dataset management. It provides the performance needed for ongoing model development while keeping data costs predictable and accessible across teams and environments. 

B2 Overdrive offers: 

  • Storage at roughly $15 per terabyte
  • High throughput and rapid access for post-training workflows
  • Simple APIs and event notifications to automate data movement across environments

This kind of architecture gives teams the freedom to use each platform where it shines. Backblaze handles the heavy lifting for most workloads, while CoreWeave adds targeted acceleration when raw GPU performance is the top priority. The result is a flexible, cost-aware workflow that supports both innovation and scale.

AI infrastructure that plays to every strength

The most effective AI setups use the right cloud for the right job. They run training where GPUs can perform at their peak, and store data where it stays organized and ready to move when needed.

B2 Overdrive provides a foundation for this strategy, offering a layer of object storage that keeps data secure, accessible, and easy to integrate across environments. Teams can combine each platform’s strengths to achieve speed when it’s needed, scalability that endures, and freedom from lock-in and runaway costs.

The AI ecosystem is expanding, and with the right partners, so are the possibilities.

See how Backblaze B2 Overdrive keeps AI data fast, flexible, and affordable.

The post Why CoreWeave’s Object Storage Launch is Good for AI—and Everyone Building It appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Three Hidden Costs in AI Video Storage

Post Syndicated from Maddie Presland original https://www.backblaze.com/blog/three-hidden-costs-in-ai-video-storage/

A decorative image showing various types of media.

Generative AI video is exploding. Platforms can turn prompts into polished clips, and models churn through massive training sets of images and footage. Behind the magic, though, is the unavoidable reality of storing and moving petabytes of data. 

Training runs require archiving colossal datasets, then pulling them back in full when it’s time to retrain. Once models go live, the output itself becomes another major workload to manage, whether that’s endless libraries of user-generated videos or fast-cycling streams of ephemeral content. These challenges are part of life for every GenAI company, but the costs of handling them vary widely depending on the provider.Those cloud storage costs can spiral quickly out of control. The big cloud providers lure teams in with low headline rates, but the fine print tells a different story. Pricing depends on which storage tier you pick, how often you move data between regions, and how many API requests your pipeline makes. Founders end up building workflows around cloud quirks instead of what’s fastest and simplest for their teams.

Free ebook: The Cost of Cloud Storage for AI

Struggling to keep AI storage costs under control? Download our free ebook to discover how to optimize cloud storage for AI workloads—without compromising performance.

Download the Ebook

Hidden cost #1: Storage tiers and complexity

AI video data doesn’t behave neatly. Training sets might sit untouched for long stretches before being needed again all at once. User-facing content might accumulate forever, or spike and crash depending on the latest trend. For lean engineering teams, predicting these swings is nearly impossible.

On major cloud providers, the stakes are high. Choose a hot tier and you’ll overpay when data goes cold. Pick an archive tier and you’ll face delays and penalties when you suddenly need that dataset tomorrow. Constantly shifting petabytes between tiers adds both operational overhead and surprise costs.

The numbers tell the story: a 5PB archive costs about $120K a month on AWS S3 Standard for storage alone, before any egress charges. The same capacity runs closer to $30K on Backblaze B2 Cloud Storage—a $90K delta that could fund another GPU cluster or extend a startup’s runway.

Backblaze B2 comes in at around one-fifth the cost of S3, with no tiering games to manage. And when workloads demand maximum throughput, B2 Overdrive scales while delivering a stronger price-to-performance ratio than others offer. That means less time modeling cost scenarios and more time iterating on product and model design.

Hidden cost #2: Egress fees

AI development thrives on iteration. Training and retraining cycles shuffle enormous datasets across clusters, often more than once a month. Each transfer can rival the cost of storage itself. And the faster a team wants to move, the more those bills stack up.

The big cloud providers introduce friction at every step. They charge not only when data exits their cloud but also when it crosses between their own regions. At petabyte scale, those tolls can reach five or even six figures in a single month, forcing founders into an impossible tradeoff: experiment less or drain the budget.

Consider that moving just 1PB once per month on AWS in the US East (N. Virginia) region racks up around $53.8K. Double that transfer frequency and you’re staring at over $100K in egress fees. That’s budget better spent on hiring, acquiring customers, or building better products.

Backblaze removes this bottleneck. Backblaze B2 already includes free egress to leading GPU and CDN partners. For companies operating at AI scale, B2 Overdrive goes further with unlimited free egress to any destination. That means models can be trained, tuned, and distributed globally without a single surprise charge standing in the way of progress.

Mirage, an AI video platform, experienced this firsthand. By eliminating egress costs, they cut storage-related expenses by up to 95% compared to their previous provider—freeing resources to reinvest in growth and product innovation.

Hidden cost #3: API requests and transaction fees

Not every AI video workflow interacts with storage the same way. Some stream large video files in big chunks, keeping the number of calls manageable. Others slice data into millions or billions of tiny objects—frames, embeddings, or metadata—and rely heavily on listing and indexing operations. In those cases, what looks like spare change per request quickly compounds into thousands of dollars in charges every month.

Major cloud storage providers are relentless here. Every PUT, GET, LIST, or HEAD operation comes with a fee, no matter how small. At scale, those fractions of a cent add up fast, leaving engineers designing around billing quirks instead of choosing the cleanest solution for their pipelines.

Picture a pipeline that generates one billion writes and two billion reads in a single month. On AWS, the tab for those transactions alone would run close to $5.8K. On Backblaze B2, writes are free and reads cost just $0.004 per 10,000 requests, bringing the same workload down to about $800. And the first 2,500 Class B and Class C transactions each day are free, further shrinking the bill. On B2 Overdrive, all API calls are included at no additional cost.

Whether your architecture leans toward billions of tiny objects or more efficient streaming, Backblaze keeps request charges predictable and manageable. That makes API calls something your team doesn’t need to obsess over, which is exactly how it should be.

Bringing it together: Simple, predictable economics

Taken together, these hidden costs show why storing AI video on “the big three” often feels like playing a rigged game. The pricing looks straightforward until the bills arrive, padded with charges for tiers, transfers, and transactions. Each one eats away at budget and slows the pace of innovation.

Backblaze offers a different path. By stripping out the fine print and focusing on price-to-performance, it makes storage a stable foundation instead of a moving target. Mirage proves what that means in practice: eliminating egress fees drove huge savings and freed resources to reinvest in their product.

For founders, that kind of predictability turns storage from a frustrating line item into the fuel for faster iteration, bolder experimentation, and sustainable growth.

The post Three Hidden Costs in AI Video Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Boosting Unit Test Automation at Audible with Amazon Q Developer

Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/boosting-unit-test-automation-at-audible-with-amazon-q-developer/

Audible, an Amazon company, is a leading producer and provider of audio storytelling. With a vast library of over 1,000,000 titles including audiobooks, podcasts, and Audible Originals with specific curated offerings available in each marketplace, Audible makes it easy to transform everyday moments into extraordinary opportunities for learning, imagination, and entertainment through immersive audio experiences. Robust testing is critical to ensure millions of end users enjoy a seamless experience across devices.

Remember the last time you inherited a software application codebase with minimal test coverage? Or perhaps you’ve written code in a rush to meet a deadline, promising yourself you’d add tests “later”? We’ve all been there. Testing is crucial but can often gets deprioritized when deadlines loom. That’s where Amazon Q Developer‘s agentic workflows come in, transforming the way developers approach test generation. This blog explores how Audible used Amazon Q Developer to boost their unit test coverage.

Business Use Case for Software Testing

In high velocity development environments, testing cycles can often times get compressed under tight deadlines, increasing quality risks. Amazon Q Developer transforms this paradigm by accelerating testing while maintaining comprehensive standards. Through automated test generation, edge case identification, and fix suggestions, teams execute thorough testing in reduced timeframes, delivering expedited releases, optimized QA resources, and enhanced production readiness.

Each function that does not have the appropriate testing implemented, represents the potential for a rework, bugs, and maintenance challenges. Additionally, inherited codebases present particular challenges: developers must choose between spending weeks writing tests for existing functionality or continuing the cycle of untested code.

Amazon Q Developer addresses these challenges by reducing the time and effort required for proper test coverage, transforming testing from a burdensome chore into a streamlined process that allows teams to focus on delivering new features while helping to ensure code quality.

Amazon Q Developer: Expanding test coverage for your codebase

Amazon Q Developer introduces an advanced approach to software testing generation through its agentic workflows. Unlike traditional test generation tools that produce generic tests, Amazon Q Developer analyzes your code’s intent, business logic, and edge cases. It doesn’t just generate tests; it creates meaningful test suites that validate your code’s behavior comprehensively.

Beyond the dedicated test generation workflow we’ll explore today, Amazon Q Developer offers multiple ways to assist with testing. You can use conversational prompts for test plan generation, request test improvements for existing code, or even pair-program with Amazon Q Developer as you write tests. The flexibility to integrate AI assistance throughout your testing workflow makes Amazon Q Developer a versatile companion for developers.

Amazon Q Developer workflow architecture

The following architecture diagram illustrates how Audible leveraged Amazon Q Developer for both test generation and code transformation:

A flowchart showing Amazon Q Developer's test generation and code transformation process. The diagram illustrates three main workflows: Test Generation: Shows how legacy Java codebase (11 packages) is analyzed using AI capabilities to generate comprehensive test suites including unit tests and edge cases. Code Transformation: Depicts the process of migrating JDK 8 code to JDK 17, including JUnit 4 to JUnit 5 conversion, with over 5,000 tests transformed. Results: Demonstrates outcomes including improved test coverage, successful JDK migration, and 50+ hours saved through code transformation. The diagram uses icons and color-coded sections to show the flow between different stages, with blue sections for initial inputs, green for processing, and grey for outputs.

The Amazon Q Developer workflow demonstrates two key capabilities:

  • Test Generation: Amazon Q Developer analyzes Java classes and creates comprehensive test suites including unit tests, edge case tests, and exception handling tests.
  • Code Transformation: Amazon Q Developer performs automated migration tasks including JDK 8 to JDK 17/21 upgrades, handling language version compatibility, JUnit 4 to JUnit 5 conversion, modernizing test framework syntax and annotations, syntax migration, updating deprecated APIs and code patterns.

What makes this workflow particularly powerful is how it combines AI capabilities with human expertise, allowing expert developers to leverage AI in their day-to-day workflow. Amazon Q Developer analyzes your codebase and uses it as a context, identifies edge cases, and performs automated transformations, while developers apply their domain knowledge to ensure the outputs align with business requirements and expected behavior.

Audible’s Approach to harness the potential of Amazon Q Developer

The Audible teams followed the below steps to harness Amazon Q Developer to boost test coverage.

Code Submission: The Audible team leveraged Amazon Q Developer to enhance their test coverage by generating additional unit tests for Java classes, including static methods and methods with existing test cases. This approach complemented their robust testing strategy. Amazon Q Developer has the ability to examine classes, methods, parameters, return types, and exceptions. Amazon Q Developer is helpful in automatically identifying unit tests to cover edge cases that can easily be overlooked, such as null input checks and empty string checks.

Targeted Requests: The Audible team specifically asked Amazon Q Developer to provide:

  • Suggestions for unit tests to cover the given method within a Java class
  • Recommendations for unit tests targeting untested edge cases
  • Recommendations for test cases addressing error handling and exception scenarios

The Audible team achieved significant improvements using Amazon Q Developer for both test generation and code transformation. The key to their success was providing rich context along with targeted prompts in a systematic workflow.

Developer Workflow

A horizontal workflow diagram titled 'Developer Context Creation Workflow' showing 5 sequential steps: Developer opens class file Developer selects method and adds prompt Selected method and prompt are submitted to Amazon Q Developer Amazon Q Developer generates tests Developer reviews and integrates the generated tests Each step is connected by colored arrows and represented by simple icons including user figures, file symbols, and tool icons. The process flows from left to right, illustrating the complete cycle from initial file access to final test integration.

Audible adopts a human in the loop approach to review the output from automation tools. The above workflow shows the complete process: (1) open a class file in their IDE, (2) select a specific method and add their prompt, (3) submit this combined context to Amazon Q Developer, (4) receive generated tests, and (5) review and integrate the tests into their codebase.

Effective Prompts and Approach

The Audible team followed a structured approach, using targeted requests that Amazon Q Developer could act upon:

Code Submission: The team provided Java classes to Amazon Q Developer with code to generate tests for individual methods, including static methods and those that already had some tests but lacked full coverage. Amazon Q Developer examined classes, methods, parameters, return types, and exceptions, automatically identifying unit tests to cover edge cases like null input checks and empty string checks.

Below are generic Sample Prompts for Specific Requests:

Basic Test Generation:

Generate unit tests for the following Java method. Focus on covering all possible input scenarios and edge cases:

[method code here]

Please include tests for:
- Valid input scenarios
- Null input checks
- Empty string validations
- Exception handling

Edge Case Focus:

I have this method that processes user input. Can you suggest unit tests that cover edge cases I might have missed? Pay special attention to boundary conditions and error scenarios:

[method code here]

Manual Framework Migration (via Q Developer Chat):

Convert this JUnit 4 test to JUnit 5 format. Make sure to update annotations and use modern JUnit 5 features where appropriate:

[JUnit 4 test code here]

Note: While Amazon Q Developer’s code transformation feature can handle JUnit4 to JUnit5 migration automatically across entire codebases, Audible also used the conversational interface for manual, targeted conversions as shown above. Both approaches are available. Refer to documentation for automated transformation details.

Test Generation: Based on the team’s requests, Amazon Q Developer generated specific test suggestions addressing these areas with appropriate assertions and test methods.

Implementation: The development team implemented the suggested tests after review.

Documentation: Amazon Q Developer has the ability to add comments to explain the purpose of the test, area of the functionality that the test is covering. In addition, Amazon Q Developer also has the ability to generate documentation related to other aspects like read-me files and project documentation.

Quantifiable Results

By leveraging Amazon Q Developer, the Audible team achieved:

  • Over 10 key packages received comprehensive unit test coverage
  • ~1 hour saved per test class (typically containing 8-10 individual tests)
  • 5,000+ test cases successfully migrated from JUnit4 to JUnit5 using both Amazon Q Developer’s code transformation and manual conversational assistance
  • 50+ hours of manual work saved during the JDK8 to JDK17 migration using Amazon Q Developer’s code transformation
  • Reduced human errors through AI-assisted transformations

Key Capabilities Demonstrated

Amazon Q Developer excelled in several areas that can be overlooked in manual testing:

Comprehensive Exception Testing: Beyond standard null input checks and empty string validations, it automatically suggested tests for IllegalArgumentException, NullPointerException, and custom business exceptions, including verification of both exception throwing and specific error messages. This systematic approach made test coverage more complete and error handling more robust.

Automated Edge Case Detection: Amazon Q Developer made inline suggestions for null pointer exception handling without prompting, making the process smoother and faster.

Manual Framework Migration with AI Assistance: Amazon Q Developer’s pattern recognition accelerated the migration process through conversational assistance. The team could ask Amazon Q Developer through the chat to convert test syntax from JUnit4 to JUnit5 manually. For example, their previous setup had JUnit4 syntax with @UseDataProvider and @DataProvider annotations. All they had to do was highlight the code block, Send to Prompt, and ask Amazon Q Developer to make the test JUnit5 compatible. Within seconds, it generated a reliable JUnit5 test with ParameterizedTest annotation and Stream of Arguments that they could manually implement.

Contextual Analysis: Amazon Q Developer analyzes the existing codebase and recognized patterns and generated tests that matched the team’s coding style and testing conventions.

Conclusion

Amazon Q Developer transforms the test generation process from a time-consuming chore into a streamlined workflow, enabling teams to achieve comprehensive test coverage with minimal effort. This allows developers to focus on higher-value activities while improving code quality and reliability.

The business impact is substantial: As testing becomes less burdensome, teams naturally adopt better testing practices, creating a positive feedback loop that enhances overall code quality, and creates an opportunity for faster development cycles, and reduced time spent on maintenance.

To learn more about Amazon Q Developer’s features and pricing details, visit the Amazon Q Developer product page.

About the Authors

kirankumar.jpeg

Kirankumar Chandrashekar is a Generative AI Specialist Solutions Architect at AWS, focusing on Next Generation Developer Experience tools like Q Developer, Kiro and Developer Productivity using AI. Bringing deep expertise in AWS cloud services, DevOps, modernization, and infrastructure as code, he helps customers accelerate their development cycles and elevate developer productivity through innovative AI-powered solutions. By leveraging Amazon Q Developer, he enables teams to build applications faster, automate routine tasks, and streamline development workflows. Kirankumar is dedicated to enhancing developer efficiency while solving complex customer challenges, and enjoys music, cooking, and traveling.

alex-torres.jpeg

Alex Torres is a Senior Solutions Architect at AWS, supporting Amazon.com in architecting, designing, and building applications on AWS. With deep expertise in security, governance, and Agentic AI for developers, he helps customers leverage cutting-edge cloud technology to create products that shape people’s lives. Passionate about empowering teams to solve complex challenges through innovative AWS solutions, Alex is dedicated to driving customer success while maintaining the highest standards of security and governance. Outside of work, he enjoys cooking and hiking.

GK is a Senior Customer Solutions Manager and strategic customer advisor supporting Amazon as a customer of AWS. Over her four years at AWS, she has focused on improving developer productivity and advocating for Amazon’s needs across AWS services to enhance user experience and drive deeper alignment between the two organizations. Her work with advanced Amazon teams helps deliver solutions that ultimately benefit both internal and external AWS customers. GK is particularly interested in how GenAI is bridging the gap between developers and non-developers, and she spends much of her time solving challenges in GenAI and security. She is based in the San Francisco Bay Area and enjoys hiking and camping.

Aditi Joshi is a Software Engineer at Audible, working on expanding Audible’s presence across Amazon platforms. As a full-stack developer, she primarily works with web technologies, cloud services, and programming languages like JavaScript and Java to build and enhance cross-platform integration features, including recent projects like introducing Audible purchase capabilities in the Amazon iOS app. With expertise in user interface development, responsive design, and web technologies, she focuses on showcasing Audible offers and growing Audible’s visibility across Amazon’s ecosystem. Aditi is passionate about software architecture and user experience, focusing on building scalable systems with clean, efficient code. When not coding, Aditi enjoys traveling, practicing yoga, and listening to music.

Sam Park is a Software Development Engineer at Audible, focused on building Audible features across Amazon platforms. He has played a key role in enabling Audible purchases through Amazon Cart, as well as expanding Audible’s visibility within the Amazon iOS and Android apps. His work spans multiple touchpoints within the Amazon ecosystem, including Search, Product pages, Checkout, and Cart experiences. Sam is passionate about developing solutions that create intuitive customer experiences and leveraging GenAI to boost development efficiency and productivity. Outside of work, he enjoys traveling, playing basketball, and cheering on the Cleveland Cavaliers.

Multi Agent Collaboration with Strands

Post Syndicated from Aaron Sempf original https://aws.amazon.com/blogs/devops/multi-agent-collaboration-with-strands/

In the evolving landscape of autonomous systems, multi-agent collaboration is becoming not only feasible but necessary. As agents gain more capabilities, like advanced reasoning, adaptation, and tool use, the challenge shifts from individual performance to effective coordination. The question is no longer “can an agent solve a task?” but “how do we organize execution across many intelligent agents?”

A foundational step toward answering this came with the Supervisor pattern, introduced in our article on creating asynchronous AI agents with Amazon Bedrock. The Supervisor addresses the first generation of coordination challenges by acting as a centralized orchestrator, monitoring and delegating tasks across agents in a structured, serverless workflow. It provides asynchronous orchestration, fallback handling, and state tracking across loosely coupled agents, giving organizations a reliable way to move from single-agent prototypes to multi-agent systems.

Yet as agentic systems scale and become more dynamic, the limitations of static supervision become clear. The Supervisor model assumes a relatively stable set of agents and predictable workflows; but modern systems face constantly shifting tasks, emergent capabilities, and the need for adaptive coordination. This is where the Arbiter pattern emerges as the natural evolution: a next-generation supervisory model that extends the Supervisor with dynamic agent generation, semantic task routing, and blackboard-model-based coordination. By addressing the unpredictability and fluidity of large, evolving agent ecosystems, the Arbiter pattern enables systems not only to manage complexity but to thrive in it.

The Arbiter pattern builds directly on this by adding three key capabilities:

  1. Semantic Capability Matching: Instead of only assigning known tasks to known agents, the Arbiter reasons about what kind of agent should exist for a task—even if that agent doesn’t exist yet.
  2. Delegated Agent Creation: If no suitable agent is found, the Arbiter escalates the request to a Fabricator agent that dynamically generates a task-specific agent on demand. This moves beyond delegation to true adaptive generation.
  3. Task Planning + Contextual Memory: Building on the Supervisors task coordination capability, Arbiter decomposes complex inputs into structured task plans, and uses contextual memory to track execution, retry logic, and agent performance.

In short, the Arbiter transforms static orchestration into adaptive coordination.

The Blackboard Model Revisited

To enable loose, extensible collaboration across agents, the Arbiter Pattern incorporates principles from the blackboard model – a classic architecture from distributed AI. In this model, agents contribute opportunistically to a shared data space (the “blackboard”), reacting to changes and collectively solving problems.

Reference: See “The Blackboard Model of Control” (Hayes-Roth et al.), and early applications like Hearsay-II for foundational research.

In our extended Arbiter Pattern, the blackboard becomes a semantic event substrate. Agents, including the Arbiter, publish and consume task-relevant state, enabling loosely coupled, event-driven collaboration.

How It Works

When an event enters the system, the Arbiter takes on the supervisory role but extends it with greater dynamism and adaptability. Like the Supervisor pattern, it begins by interpreting the event and identifying the required objectives and sub-tasks. It then performs a capability assessment, using a local index or peer-published manifests, much like the Supervisor querying an Agents config table.

  1. Interpretation: The Arbiter uses LLM-based reasoning to extract task objectives and sub-tasks.
  2. Capability Assessment: It evaluates which agents can handle each sub-task using a local index or peer capability manifests.
  3. Delegation or Generation:
    • If a suitable agent exists, the task is routed accordingly.
    • If not, the Arbiter sends a generation request to the fabricator agent.
  4. Blackboard Coordination: All agents involved read/write to a shared semantic blackboard, contributing as needed based on observed task state.
  5. Reflection and Adaptation: Performance data is logged and used to inform future agent creation, adaptation, or deprecation.

Arbiter Pattern Architecture

Unlike the Supervisor, which maintains orchestration through a static config list, the Arbiter introduces a shared semantic blackboard that allows all participating agents to read, write, and coordinate based on evolving task state. This blackboard serves as a dynamic collaboration space, enabling mid-task adaptation and richer multi-agent coordination.

The following Diagram 1: Agentic AI Arbiter pattern implemented as a code example can be downloaded here

Architecture diagram of the Arbiter Pattern for Agentic AI. The diagram illustrates the components and flow of the pattern, showing how multiple AI agents interact with an arbiter to coordinate tasks and decision-making in a structured system

Diagram 1: Agentic AI Arbiter pattern

The following sequence describes the Arbiter pattern, according to the numbered steps in the diagram 1: Agentic AI Arbiter pattern

  1. Events entering the system trigger the Supervisor function
  2. Supervisor queries Agents Config table for agent capabilities
  3. Supervisor uses Agents config list as context to plan orchestration of tasks

Option: New Agent:

If no capable agent is found, the Arbiter goes further than the basic supervisor pattern: it issues a generation request to a fabricator agent, which synthesizes new worker code, stores it for runtime access, and updates the capability registry so the agentic system can immediately benefit from the new skill.

  1. Task cannot be completed, request create new capability
  2. Request to fabricate triggers Fabrication agent instance
  3. Fabrication agent queries resources register for available tools (capabilities)
  4. Fabricator generates worker agent code
  5. Worker agent code stored in bucket for runtime access
  6. New worker added to Agents config list with agent capabilities description
  7. Result of fabrication posted to message bus

Repeat steps 1, 2 & 3

Option: Orchestrate workflow:

If a suitable agent exists, the Arbiter orchestrates the workflow by invoking the appropriate worker agents, tracking progress and state as in the Supervisor model.

  1. Orchestration of tasks is stored for tracking end-to-end process
  2. Request to invoke worker agent, by name/id. Add workflow state for agent invocation.
  3. Request to invoke worker agent triggers worker agent wrapper instance
  4. Worker agent wrapper loads agent code
  5. Worker agent reasons and takes action
  6. Worker agent sends response to message bus
  7. Supervisor agent updates workflow state and tracks against orchestration

The Arbiter incorporates a reflection and adaptation loop: performance data from task execution is logged, analyzed, and fed back into the fabricator and coordination logic. This ensures that not only are tasks completed in the moment, but the system continuously adapts, retires underperforming agents, and evolves toward greater efficiency.

The Arbiter Agent: Event Orchestration Engine

The Supervisor Agent (Arbiter Agent) serves as the central coordinator component, managing complex event-driven workflows through intelligent task delegation.

Event Processing Workflow:

The Arbiter pattern follows a structured approach to handle incoming events

  1. Configuration Loading: Loads available agent configurations from Amazon DynamoDB via load_config_from_dynamodb()
  2. LLM Invocation: Invokes Amazon Bedrock LLM with event context and available tool specifications
  3. Decision Analysis: LLM analyzes the event and returns tool invocation decisions with parameters
  4. Task Dispatch: For each specified tool call:
    • Extracts tool name, input parameters, and tool use ID
    • Dispatches message to corresponding Amazon Simple Queue Service (SQS) queue via process_tool_call()
    • Maintains tool invocation list for workflow tracking

Workflow State Management:

The system maintains comprehensive state tracking throughout execution

  • Creates workflow tracking record in DynamoDB with create_workflow_tracking_record()
  • Initializes all invoked agents as incomplete
  • Associates unique request ID with orchestration instance
  • Persists orchestration state including conversation history and request mapping

Completion Coordination:

The Arbiter coordinates task completion through a systematic process

  1. Event Reception: Receives agent completion events via Amazon EventBridge
  2. Status Updates: Updates workflow tracking with update_workflow_tracking()
  3. Completion Check: Performs completion check across all tracked agents
  4. Result Aggregation: When all agents complete:
    • Aggregates results from DynamoDB data field
    • Appends tool results to conversation as user messages
    • Re-invokes orchestration with updated context
  5. Continuation: Continues until LLM provides final response without tool calls

The Fabricator Agent: Dynamic Capability Generation

The Fabricator Agent implements just-in-time agent development using the Strands agents framework, creating new capabilities when required functionality doesn’t exist in the system.

Agent Development Architecture:

The Fabricator operates as a specialized Strands Agent with specific characteristics

  • Implemented as a Strands Agent with specialized system prompt for code generation
  • Triggered by “New worker agent” events from the Arbiter
  • Receives capability requirements through prompt augmentation with agent directive
  • System prompt includes:
    • Strands Agent implementation examples
    • Complete catalog of available Strands Tools
    • Code generation patterns and conventions
    • Standardized handler() function requirements

Code Generation Process:

The agent follows a structured development workflow

  1. Requirement Analysis: LLM analyzes capability requirements and generates Python implementation
  2. Tool Selection: Prioritizes use of existing Strands Tools over custom @tool implementations
  3. Code Structure: Creates agents following standardized patterns:
    • Bedrock model initialization with models.BedrockModel()
    • Agent instantiation with appropriate tool selection
    • Standardized handler() function interface
    • Event-driven completion signaling
  4. File Creation: Writes generated code to /tmp/ directory for immediate availability

Capability Registration Pipeline:

New capabilities are registered through a multi-step process

  1. File Storage: File upload to Amazon Simple Storage Service (S3) via upload_file_to_s3() tool
  2. Metadata Registration: Registration in DynamoDB via store_agent_config_dynamo():
    • toolId: Unique capability identifier
    • filename: S3 object reference
    • schema: OpenAPI specification for LLM tool calling
    • description: Human-readable capability documentation
    • action: SQS queue routing configuration for Generic Wrapper
  3. Completion Notification: Completion event publication to Arbiter via complete_task() tool

Testing Considerations:

The original implementation revealed important insights about testing approaches

  • Previous Approach: Agent testing within the Fabricator resulted in:
    • Unstructured testing leading to false negatives
    • Overzealous optimization of generated agents
  • Recommendation: Separate testing agent with standardized harness for validation feedback

The Generic Wrapper: Dynamic Execution Runtime

The Generic Wrapper implements a hot-loading pattern that enables unlimited agent creation without infrastructure scaling, providing a universal execution environment for Fabricator-generated agents.

This hot-loading approach is critical because it decouples capability growth from infrastructure scaling. Instead of provisioning and maintaining new infrastructure components for every new agent, which could be dozens or even hundreds of agents, the system reuses a single execution wrapper that can dynamically load and execute arbitrary agent code.

This not only makes agent creation effectively limitless but also ensures infrastructure efficiency, cost optimization, and simplified operations, allowing the Arbiter and Fabricator to evolve system capabilities without operational bottlenecks.

In the AWS Samples code, found here, the Hot-loading handler is implemented as am AWS Lambda function, represented in the following code snippet:

def process_event(event, context):
    orchestration_id = event["orchestration_id"]
    tool_use_id = event["tool_use_id"]
    request = event["tool_input"]
    tool_name = event['node']

    # Based on the tool from the event, load the details from DDB
    tool = load_config_from_dynamodb(tool_name)
    config = tool['config']

    if isinstance(config, str):
        config = json.loads(config)

    file_name = config['filename']

    load_file_from_s3_into_tmp(os.environ["AGENT_BUCKET_NAME"], file_name)

    # Hot load the module from the tmp directory
    spec = importlib.util.spec_from_file_location("module.name", "/tmp/loaded_module.py")
    loaded_module = importlib.util.module_from_spec(spec)
    sys.modules["module.name"] = loaded_module
    spec.loader.exec_module(loaded_module)

    # Invoke the generic handler with whatever args were passed in by the Arbiter
    try:
        print("attempting to use module")
        response = loaded_module.handler(**request)
        print(f"response: {response}")
    except Exception as e:
        print(f"error running module: {e}")
        response = "The task could not be completed, this agent has issues, please ignore for now."

    # Finally. report back to the Arbiter. Handled by the wrapper. To avoid the Frabricator from attempting to code this part itself
    post_task_complete(response, tool_use_id, tool_name, orchestration_id)

Although this example is demonstrated through a lambda function, the Hot-Loading code can be executed in Amazon Bedrock AgentCore Runtime, or AWS native container services, such as Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS)

Hot-Loading Architecture:

The wrapper implements several key architectural principles

  • Single infrastructure component handles execution of all dynamically created agents
  • Eliminates need for separate infrastructure provisioning per agent
  • Implements runtime code loading from S3 storage
  • Accepts latency trade-off for infrastructure efficiency in non-ultra-low-latency environment

Dynamic Loading Process:

The system follows a precise loading sequence

  1. Message Processing: Extracts agent identifier from incoming SQS message
  2. Configuration Retrieval: Queries DynamoDB for agent configuration via load_config_from_dynamodb()
  3. Code Download: Downloads agent implementation from S3 to /tmp/ directory
  4. Runtime Loading: Module loading using importlib.util:
    • spec_from_file_location() creates module specification
    • module_from_spec() instantiates module object
    • exec_module() performs actual code loading and execution

Execution Management:

The wrapper provides comprehensive execution oversight

  • Invokes standardized handler() function with provided parameters
  • Captures execution results and handles error conditions gracefully
  • Maintains execution isolation between different agent invocations
  • Implements resource cleanup after agent execution completion

Standardized Communication Protocol:

Communication follows strict standardization to ensure system reliability, which is critical in multi-agent environments where dozens or even hundreds of dynamically generated agents may interact. Without consistent message formats, routing rules, and completion signals, orchestration would become brittle, errors would propagate unpredictably, and debugging would be nearly impossible. Standardization guarantees that every agent, no matter when it was created, can interoperate seamlessly, enabling the Arbiter to maintain end-to-end visibility, traceability, and fault-tolerance across the entire system.

Event Handling Principles:

  • Event posting handled exclusively by Generic Wrapper, not individual agents
  • Ensures consistent event-driven communication patterns across all agents

Completion Event Structure:

  • orchestration_id: Workflow context linkage
  • tool_use_id: LLM tool invocation mapping
  • node: Agent identifier for tracking
  • data: Execution results or error information

Reliability Measures:

  • Publishes completion events to EventBridge for Arbiter processing
  • Guarantees workflow tracking receives completion signals regardless of execution outcome

Scalability Characteristics:

The hot-loading approach provides significant scalability benefits

  • Enables agent scaling creation without minimal infrastructure impact
  • S3 download latency acceptable within overall system performance profile
  • Single wrapper instance can execute multiple agent types
  • Memory and resource management handled at container level

Conclusion

The Arbiter Pattern represents a significant evolution beyond the Supervisor architecture, delivering the flexibility required for truly autonomous agentic systems. By introducing semantically rich, context-aware orchestration, it enables dynamic scalability, where agent capabilities grow in step with task demands. The architecture is resilient, redistributing or regenerating tasks when agents fail, and it achieves loose coupling by having agents interact through semantically meaningful events rather than rigid APIs. Most importantly, it embeds continuous adaptation through Arbiter-guided feedback loops, allowing systems to learn and evolve over time. This marks a shift from pre-programmed logic to generative, blackboard-model-based coordination, paving the way for decentralized, intelligent systems that can learn, adapt, and collaborate effectively at scale.

The system delivers several critical capabilities

  • Asynchronous Processing: SQS-based message passing for scalable execution
  • Persistent State Management (Short-term memory): DynamoDB-based workflow tracking
  • Scalability: Hot-loading architecture for unlimited agent creation
  • Intelligent Orchestration: LLM-driven task decomposition and sequencing
  • Self-Expanding Capabilities: Strands-based agent creation on demand
  • Standardized Communication: Reliable event-driven protocols

This architecture enables processing of arbitrary event types by dynamically creating necessary processing capabilities and coordinating their execution through LLM-driven workflow orchestration, while maintaining infrastructure efficiency through hot-loading patterns.


About the Authors

aaron sempfAaron Sempf is Next Gen Tech Lead for the AWS Partner Organization in Asia-Pacific and Japan. With over 20 years in distributed system engineering design and development, he focuses on solving for large scale complex integration and event driven systems. In his spare time, he can be found coding prototypes for autonomous robots, IoT devices, distributed solutions, and designing agentic architecture patterns for generative AI assisted business automation.

josh tothJoshua Toth is a Senior Prototyping Engineer with over a decade of experience in software engineering and distributed systems. He specializes in solving complex business challenges through technical prototypes, demonstrating the art of the possible. With deep expertise in proof of concept development, he focuses on bridging the gap between emerging technologies and practical business applications. In his spare time, he can be found developing next-generation interactive demonstrations and exploring cutting-edge technological innovations.

Lessons from Vibe Coding Three Apps in Three Weeks

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/lessons-from-vibe-coding-three-apps-in-three-weeks/

A decorative images showing a gear, chips, and the word AI.

While taking some time for paternity leave in a small village in the middle of Bulgaria, I used my baby’s nap times to dive deeper into vibe coding to see just how fast and close these AI tools can get you to building real, production-ready apps. It led to a serious of articles, LinkedIn posts, and product experiments, all focused on understanding and sharing my insights on the state of programming and product design that leverage AI.

In my previous article, “ColabWithMe: A GPT Specialized in Google Colab for Data Analysis & ML,” I talked about how generative AI is redefining the programming landscape. As the Harvard Business Review noted in “We’re All Programmers Now,” this shift represents more than just enabling non-technical employees to code. The real opportunity lies in developing multi-skilled professionals who can operate across domains, compressing innovation cycles from weeks to days. (I explore this further in “The Shape of AI Training: How Skill Profiles Guide AI Learning Paths.“)

Which brings me to what I actually built during those nap times—three different applications using a variety of AI tools. Rather than focusing on polished user interfaces, I focused on backend functionality and core business logic. I discovered that debugging the frontend and getting it to look how I wanted consumed far more time than implementing core backend features. So, many of these vibe-coded apps work nicely on the backend, but need more polish on the frontend. Let’s dig in.

Tools reviewed

Vibe coding means building software by describing what you want in natural language and letting AI generate the code. I tested tools across three categories to see how they enable this new way of building.

  • Integrated development environment (IDE) integrated agents: GitHub Copilot Agent, Gemini Code Assistant, Claude, Cursor.
  • Conversational interfaces: ChatGPT, ChatGPT Codex, Grok, Claude.
  • Prompt app builders: Replit, Lovable, Bolt, GitHub Spark.

Project 1: TickGoals.com, AI-powered goal setting (Approximately 7 hours)

The first application tackled a common productivity challenge: transforming vague aspirations into actionable SMART goals. The system implements a conversational AI interface that guides users through goal refinement, then automatically generates structured milestones and tasks.

Key features:

  • Chat with AI to transform vague goals into structured SMART goals
  • Auto-generate actionable milestones and tasks based on your refined goals
  • To-do list interface for tracking progress and completion
  • Persistent goal storage with progress visualization

Tech stack:

  • React frontend for conversational UI generated by GitHub Spark
  • Firebase Functions for serverless backend processing
  • OpenAI API for goal and task creation
  • Firebase Firestore for persistent goal and task storage

Initially I prototyped across Lovable, Replit, Bolt and GitHub Spark to see what each would generate. I eventually used the code GitHub Spark generated for a cleaner React component structure. Check it out here: https://tickgoals.com

Project 2: NewsVibe.AI, newsletter aggregation and summarization platform (Approximately 12 hours)

While catching up on email, I noticed my inbox was filled with newsletters that I’d often just skim or summarize, so I built a tool to handle this automatically. The app provides users with personalized email addresses for newsletter subscriptions, then presents content in a newsfeed interface to easily scroll through with AI summarization.

Key features:

  • Personal @newsvibe.me email addresses for newsletter subscriptions.
  • Instagram-style scrollable feed displaying all your newsletters.
  • AI-powered summarization to get quick overviews of content.
  • Automatic extraction of links and key information from newsletters.
  • Subscription management dashboard with usage analytics.

Tech stack:

  • Cloudflare pages for frontend hosting.
  • Maileroo for email processing and parsing.
  • Supabase for user management and content storage.
  • Python backend deployed on Render for newsletter and summarization processing.
  • OpenAI API for content summarization.
  • Stripe integration for subscription management.

I split this project into separate frontend and backend repos, and found it blazing fast to build out all the backend functionality first before tackling the frontend.

Project 3: Welcome.AI, newsletter editor agent (Approximately 10 hours)

Welcome AI has been my side project since 2017, initially focused on competitive analysis of AI tools. I’ve rebuilt it multiple times, with the latest iteration using retrieval augmented generation (RAG) for content. But, content curation still required manual review, either by me or community contributors, so I built an agent to automate the entire process, identifying, categorizing, and synthesizing AI news into a publication-ready newsletter. View a generated newsletter here. Subscribe at https://newsletter.welcome.ai/

Key Features:

  • Automatically identifies and filters AI-related news from RSS feeds and newsletters
  • Categorizes stories by topic and summarizes key points
  • Writes complete newsletter copy with insights and summaries
  • Curates the top stories and case studies for featured content sections
  • Generates HTML formatting and generates a feature image for the top story

Tech Stack:

  • Python news feed processing
  • OpenAI Agent SDK and APIs
  • GitHub Actions for automated workflow execution
  • Supabase for content management and curation state
  • Backblaze B2 for generated feature image storage

This was purely a backend project to test and experiment with the OpenAI Agent SDK, though I diverged from it toward more direct large language model (LLM) tasks by the end.

Lessons learned

At a high level, you can definitely see how these tools are going to dramatically speed up development, especially for getting to minimum viable product (MVP) or prototype. You should only need a day or two to get something up and test market traction, especially with prompt app builders.

I found Claude Opus/Claude Code worked best for backend code within the IDE, while Gemini Pro was particularly good at frontend landing page development. Coding agents that make multiple changes across multiple files simultaneously, like those in Cursor, Copilot Agent, or ChatGPT Codex, still felt a bit daunting. I experienced chunks of code being deleted a few times, so I spent considerable time reviewing changes or reverting them.

Prompt app builders like Lovable, Replit, GitHub Spark, and Bolt can get you pretty far, but you can eventually hit a wall where the AI starts breaking more than it fixes, or you need to integrate third-party services that require direct code access. With one project, I started in a prompt builder then moved to an IDE for refinement.

High-level, here are some tips that should help in your vibe coding journey.

Before starting: Set instructions and rules

Like custom instructions in ChatGPT, each tool benefits from coding guidelines: Claude Code uses CLAUDE.md, Copilot uses configured instructions, and Cursor has rules (templates at https://cursor.directory/rules).

A screenshot of provided context for generative code tools.

Both Claude Code and Cursor support MCP (Model Context Protocol) for enhanced integrations (Cursor MCP directory: https://cursor.directory/mcp). Some tools can also index documentation folders for deeper context. Set these up first for better code generation.

Start with a complete product requirements document (PRD)

Before writing any code, spend time iterating with an LLM to generate a thorough PRD. This back-and-forth refinement process goes a long way in providing the context your AI coding tools need. Capture everything: user workflows, UI specifications, technical requirements, and success metrics. Save this in your README.md as your north star.

Prompt app builders like GitHub Spark generate PRDs first from your initial prompt, so the more complete and refined it is, the better.

Define your project structure upfront

Work with the LLM to create a structure that follows best practices but stays simple for what you’re building. An MVP doesn’t need enterprise architecture. Map out where components, services, and APIs belong, and include this in your initial prompt.

A screenshot of the project file structure for one of the vibe coding apps created by Jeronimo De Leon.

Monitor new file generation closely as AI tools can suggest new files when not needed. When this happens, correct it immediately. Keep the structure as simple as possible. Break up files that are doing too many things, as this makes them harder to read and update later.

Add context markers throughout your code

Include file paths and descriptions at the top of each file. This helps the AI maintain context when making changes. Add detailed logging at critical points to track what’s happening when things break. Watch for function renames, LLMs often change function names unnecessarily when updating code, breaking references elsewhere.

Always check current API documentation

LLMs can generate outdated code. OpenAI and Pinecone have changed their import syntax, but AI tools still produce the old versions. Have the LLM search for the latest docs, or check them yourself. Knowing how your services currently work helps you catch these mistakes immediately.

One feature, one conversation

Multitasking with AI means juggling code review while it generates more changes. Keep each conversation focused on a single feature unless features are directly related. When the LLM offers to optimize unrelated areas, decline. If the AI gets stuck repeating failed solutions, start fresh rather than fighting it.

Wisdom of the crowds

When stuck, get code reviews from other LLMs since they can catch different issues. But always review their output carefully. LLMs can duplicate functions across files or, worse, delete essential code. In Agent mode especially, I’ve seen them remove core functionality unrelated to the current task. Give specific instructions about where functions belong and double-check nothing critical disappeared.

Vibe Coding = Product Management + Engineering

The most significant shift with AI-assisted development isn’t the speed; it’s the role change. You’re no longer just implementing; you’re defining what to build, how it should work, and why it matters.

This is the multi-skilled professional evolution I mentioned earlier. When “We’re All Programmers Now,” it means domain experts can build their own solutions, but it also means programmers must become domain experts in product thinking. Success with vibe coding requires clear product vision to articulate requirements, technical knowledge to guide the AI correctly, and relentless focus on user problems.

You become the conductor orchestrating AI capabilities while maintaining the judgment to build what people actually need. The future belongs to these blended roles: product managers who understand engineering deeply enough to guide AI tools, and engineers who think like product managers. These T-shaped and M-shaped professionals operate fluidly across domains. This is how we compress innovation cycles from weeks to days: by eliminating the translation layer between idea and implementation.

The post Lessons from Vibe Coding Three Apps in Three Weeks appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Overcome development disarray with Amazon Q Developer CLI custom agents

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/overcome-development-disarray-with-amazon-q-developer-cli-custom-agents/

As a developer who has embraced the power of the Model Context Protocol (MCP)to enhance my workflows, I’m thrilled to see the addition of custom agents in the Amazon Q Developer CLI. This new feature takes the capabilities I’ve come to rely on to a whole new level, allowing me to seamlessly manage different development contexts and easily switch between them.

In my previous post, I discussed how MCP servers have revolutionized the way I interact with AWS services, databases, and other essential tools. MCP integration in Amazon Q Developer allows me to query my database schemas, automate infrastructure deployments, and so much more. However, as I started juggling multiple projects, each with their own unique tech stacks and requirements, I found myself needing a more structured approach to managing these diverse development environments.

Enter custom agents. With this new feature, I can now create and use a custom agent by bringing together specific tools, prompt, context and tool permissions for tasks appropriate for the stage of development. In this post I will explain how to configure a cusom agent for front-end and back-end development. Allowing me to easily optimize Amazon Q Developer for each task.

Background

Imagine that I am working on a multi-tier web application. The application has a React front-end written in Typescript and a FastAPI back-end written in Python. In addition to me, the team includes a designer that uses Figma, and the database administrator that manages a PostgreSQL database. There are subtle differences in how I communicate with the designer and the database administrator. For example, when I discuss a “table” with the designer, I’m likely referring to an HTML table and how the page is structured. However, when I discuss a table with the database administrator, I’m likely talking about a SQL table and how data is stored.

In the past, I had both the Figma Dev Mode MCP server and Amazon Aurora PostgreSQL MCP server configured in my environment. While this allowed me to easily work on either the front-end or back-end code, it introduced some challenges. If I asked Amazon Q Developer “how many tables do I have?” Amazon Q Developer would have to guess if I was talking about HTML tables or SQL tables. If the question is about HTML, it should use the Figma server. If the question is about SQL, it should use the Aurora server. This is not a technical limitation, it’s a language limitation. Just as I have to adjust my assumptions to talk with the designer and database administrator, Amazon Q Developer has to make the same adjustments.

Enter Amazon Q Developer CLI custom agents. Custom agents allow me to optimize Q Developer’s configuration for each scenario. Let’s walk through my front-end and back-end configuration to understand the impact.

Front-end agent

My front-end custom agent is optimized for front-end web development using React and Figma. The following code example is the configuration for my front-end agent stored in ~/.aws/amazonq/agents/front-end.json. Let’s discuss the major sections of the configuration.

  • mcpServers – Here I have configured the Figma Dev Mode MCP Server. This simply communicates with the Figma Web Design App installed locally. Note that this replaces the MCP configuration that was stored in ~/.aws/amazonq/mcp.json
  • tools and allowedTools – These two sections are related, so I will discuss them together. tools defines the tools are available to Amazon Q Developer while allowedTools defines which tools are trusted. In other words, Q Developer is able to use all configured tools, and it does not have to ask my permission to use fs_read, fs_write, and @Figma. @Figma allows Amazon Q Developer to use all Figma tools without asking for permission. More on this in the next section.
  • resources – Here I have configured the files that should be added to the context. I have included the README.md (stored in the project folder) and my own preferences for React (stored in my profile). You can read more in the context management section of the user guide.
  • hooks – In addition to the resources, I have also included a hook. This hook will run a command and inject it into the context at runtime. In the example, I am adding the current git status. You can read more in the context hooks section of the user guide.
{
  "description": "Optimized for front-end web development using React and Figma",
  "mcpServers": {
    "Figma": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "http://127.0.0.1:3845/sse"
      ]
    }
  },
  "tools": ["*"],
  "allowedTools": [
    "fs_read",
    "fs_write",
    "report_issues",
    "@Figma"
  ],
  "resources": [
    "file://README.md",
    "file://~/.aws/amazonq/react-preferences.md"
  ],
  "hooks": {
    "agentSpawn": [
      {
        "command": "git status"
      }
    ]
  }
}

Back-end agent

My back-end custom agent is optimized for back-end development with Python and PostgreSQL. The following code example is the configuration for my back-end agent stored in ~/.aws/amazonq/agents/back-end.json. Rather than describing the sections, as I did earlier, I will focus on the differences between the front-end and back-end.

  • mcpServers – Here I have configured the Amazon Aurora PostgreSQL MCP Server. This allows Amazon Q Developer to query my dev database to learn about the schema. Notice that I have configured a read-only connection to ensure that I don’t accidentally update the database.
  • tools and allowedTools – Once again, I have enabled Amazon Q Developer to use all tools. However, notice that I am more restrictive about what tools are trusted. Amazon Q Developer will need to ask permission to use fs_write or @PostgreSQL/run_query. Notice that I can allow the entire MCP server as I did with Figma or specific tools as I did here.
  • resources – Again, I have included the README.md (stored in the project folder) and my own preferences for Python and SQL (both stored in my profile). Note that I can also use glob patterns here. For example, file://.amazonq/rules/**/*.md would include the rules created by the Amazon Q Developer IDE plugins.
  • hooks – Finally, I have also included the hook for the front-end and back-end. However, I could have included project specific options such as npm run for the front-end and pip freeze for the back-end.
{
  "description": "Optimized for back-end development with Python and PostgreSQL",
  "mcpServers": {
    "PostgreSQL": {
      "command": "uvx",
      "args": [
        "awslabs.postgres-mcp-server@latest",
        "--resource_arn", "arn:aws:rds:us-east-1:xxxxxxxxxxxx:cluster:xxxxxx",
        "--secret_arn", "arn:aws:secretsmanager:us-east-1:xxxxxxxxxxxx:secret:rds!cluster-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx-xxxxxx",
        "--database", "dev",
        "--region", "us-east-1",
        "--readonly", "True"
      ]
    }
  },
  "tools": ["*"],
  "allowedTools": [
    "fs_read",
    "report_issues",
    "@PostgreSQL/get_table_schema"
  ],
  "resources": [
    "file://README.md",
    "file://~/.aws/amazonq/python-preferences.md",
    "file://~/.aws/amazonq/sql-preferences.md"
  ],
  "hooks": {
    "agentSpawn": [
      {
        "command": "git status"
      }
    ]
  }
}

Using custom agents

The real power of agents becomes evident when I need to switch between these different development contexts. I can now simply run q chat --agent front-end when I am working on React and Figma or q chat --agent back-end when I am working with Python and SQL. Amazon Q Developer will configure the correct agent with all my preferences.

In the following image, you can see the configuration in the Amazon Q Developer CLI. Notice that the front-end agent has an additional tool called Figma while the back-end agent has an additional tool called PostgreSQL. In addition, the front-end agent trusts fs_write and all of the Figma tools while the back-end agent will ask permission to use fs_write and only trusts one of the two PostgreSQL tools.

A split terminal view showing tool permissions for front-end and back-end environments. Both displays list built-in commands like execute_bash, fs_read, fs_write, report_issue, and use_aws, along with their permission status (trusted, not trusted, or trust read-only commands). The front-end environment also shows Figma (MCP) related permissions, while the back-end shows PostgreSQL (MCP) permissions. At the bottom of each view is a note that trusted tools will run without confirmation and instructions to use "/tools help" to edit permissions.

Similarly, let’s look at the context configuration in both the front-end and back-end agents. In the following image, I have included my React preferences for front-end development, and both Python and SQL preferences for back-end development.

A split terminal view showing the output of "/context show" command for both front-end and back-end environments. The front-end agent shows matches for "~/.aws/amazonq/react-preferences.md" and "README.md", while the back-end agent shows matches for "~/.aws/amazonq/python-preferences.md", "~/.aws/amazonq/sql-preferences.md", and "README.md". Each file is marked with "(1 match)" in green text.

As you can see, custom agents allow me to optimize the Amazon Q Developer CLI for each task. Of course, front-end and back-end agents are just an example. You might have a developer and testing agents, data science and analytics agents, etc. Custom agents allow you to tailor the configuration to most any task.

Conclusion

Amazon Q Developer CLI custom agents represent a significant improvement in managing complex development environments. By allowing developers to seamlessly switch between different contexts, they eliminate the cognitive overhead of manually reconfiguring tools and permissions for different tasks. Ready to streamline your development workflow? Get started with Amazon Q Developer today.

Architecting Your AI Data Pipeline Using B2 Overdrive

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/architecting-your-ai-data-pipeline-using-b2-overdrive/

A decorative image showing cloud storage and AI icons.

When you think about cloud infrastructure for AI, you immediately think of GPUs and other high-performance compute resources, and how your cloud architecture should be optimized to make the most of these expensive compute plans. But compute isn’t the only cloud product category you need to monitor to both scale your application and maintain a sustainable cloud infrastructure budget.

What ultimately fuels AI? Data—lots and lots of data. As part of a healthy AI pipeline, several versions of the same dataset need to be stored in a centralized repository, or multiple repositories if your strategy requires splitting data into cold vs. hot storage to reduce storage costs. For text-based LLMs, storage costs are minimal compared to compute resources. But as AI innovation increasingly relies on video and other media, both the base storage cost and data retrieval fees can make cloud bills spiral out of control.

In this blog, we’re taking a look at the AI data pipeline, where object storage sits in each stage, and how leveraging both Backblaze B2 and B2 Overdrive helps both increase performance and reduce costs for AI applications.

AI data pipeline stages

There are five key AI data pipeline stages where data retrieval and overall performance is critical—and this performance starts with your designated data storage backend.

  • Data ingest and active archive: Data is gathered from multiple designated sources (including APIs, internet of things (IoT) sensors, relational databases, etc.) and ingested into a centralized repository or multiple repositories.
  • Data processing: The raw data is transformed and enriched based on the model’s data parameters. This can range from relatively simple text cleanup to adding annotations and metadata. Feature engineering is performed to extract or construct meaningful attributes. All data is then converted into numerical representations (e.g., embeddings, vectors) suitable for model training and inference.
  • Model experimentation and training: Processed data is used to train models by learning underlying patterns. Iterative experiments in a test environment evaluate, tune, and improve model performance and accuracy.
  • Model deployment and inference: New data is prepared in the same way as during training and sent to the deployed model to generate predictions, support decision-making, and deliver personalized outputs.
  • Monitoring: Continuous monitoring tracks model performance, detects data drift, and flags potential bias, ensuring the model remains accurate and reliable over time.

Keep in mind that data ingestion and processing isn’t always sequential, such as when data is collected and ingested, but corruption is detected during processing. Ideally, your pipeline is configured with validation gates so that corrupt data is identified and handled before proceeding to downstream steps like testing, training, and production deployment.

When using cloud object storage as your data repository, one factor of selecting a plan (like cold versus hot storage) is the specific type of data ingestion that’s being utilized based on both the data source and AI model’s specific needs.

  • Batch ingestion is better suited for mid to lower performance storage, as this is typically used for historical datasets or a set schedule of pre-determined data updates, such as jobs pulling from relational databases or CSV uploads once a day or once per week.
  • Streaming ingestion is well-suited for hot storage to support a continuous stream of real-time (or near-real-time) data processing, such as from social media feeds and high-volume e-commerce AI helper agents.
  • Hybrid ingestion uses a combination of batch and streaming ingestion to handle both historical and real-time data requirements for AI models.

Where does cloud object storage sit in the AI data pipeline?

Everywhere. All scalable data pipelines lead to object storage.

Why? Data ingestion and active archive are the major areas where object storage fulfills an important purpose. When training AI models, especially in production, data scalability for multiple and diverse data types is a hard requirement. But object storage plays a key role in the other pipeline stages:

  • Data processing: Stores versioned outputs from data labeling, feature engineering, and cleaning processes.
  • Model experimentation and training: Provides high-throughput access to training datasets and stores model checkpoints.
  • Model deployment and inference: Stores serialized model artifacts with API-based retrieval for serving predictions at scale.
  • Monitoring: Stores synthetic outputs from generative models, logs, feedback, and performance metrics for analysis and reuse.

For both AI data performance and cost optimization, selecting an object storage product or tier is far from one-size-fits-all. You can strategically allocate your data to B2 Cloud Storage or B2 Overdrive, with your most essential model data stored in B2 Overdrive.  Here’s a high-level diagram of what Backblaze B2 product to use for each stage, including examples of the data stored at each stage.

Learn more at Ai4 in August

Want to learn more? Backblaze is heading to Las Vegas for Ai4 August 11–13! In addition to booking a meeting to speak with our storage experts and stopping by our booth to pick up some swag, I’m excited to talk more about the AI data pipeline during my talk. If you’re attending Ai4, add The AI Pipeline Starts with Storage: Architecting Scalable Data Foundations to your conference agenda.

Can’t attend live in Vegas? Reach out to our Sales team to talk about your specific use case and how B2 Overdrive can help propel your data.

The post Architecting Your AI Data Pipeline Using B2 Overdrive appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Streamline DevOps troubleshooting: Integrate CloudWatch investigations with Slack

Post Syndicated from Paige Broderick original https://aws.amazon.com/blogs/devops/streamline-devops-troubleshooting-integrate-cloudwatch-investigations-with-slack/

Infrastructure alerts pose a challenge for DevOps teams, particularly when they occur outside of regular business hours. The complexity isn’t merely in receiving notifications, it lies in rapidly assessing their severity and determining the root cause. This challenge is compounded when upstream service disruptions cascade into multiple downstream alerts, creating a confusion of notifications that mask the true source of the problem. DevOps teams find themselves working backwards through a complex web of interconnected services, unsure whether to start investigating at the application, network, or infrastructure level.

To reduce resolution time and alert root cause analysis, AWS introduced CloudWatch Investigations, a generative AI-powered capability within Amazon CloudWatch. Powered by Amazon Q Developer, a generative AI–powered assistant for software development, CloudWatch investigations analyzes multiple metrics, logs, and deployment events to provide suggestions for remediation and root-cause analyses, reducing alarm resolution time. A key advantage of this feature is the ability to integrate these findings directly into Microsoft Teams and Slack, making sure developers and stakeholders receive immediate alerts when issues arise. This centralized collaboration approach enables teams to work together efficiently, reducing duplicate efforts and facilitating consistent problem-solving across the organization.

In this blog post, we will walk through how to integrate CloudWatch Investigations with Slack channels and demonstrate how to interact with investigations in Slack.

Overview of the solution

CloudWatch Investigations can be started in multiple ways, like from existing Amazon CloudWatch log insights, metrics, or alarms. To demonstrate CloudWatch Investigations functionalities, we will use CloudWatch alarms in a sample web application available in the aws-samples GitHub repository. Steps on how to deploy this web app in your AWS environment, via a CloudFormation template, can be found here. You can learn more about the architecture of the resources deployed in the AWS One Observability workshop. If you choose to deploy the sample web application, you will be responsible for all service charges associated with the CloudFormation template deployment. Alternatively, you can use existing CloudWatch alarms in your environment. Examples of common Amazon CloudWatch alarms include: MemoryUtilization, CPUUtiliziation, 5xxErrors and 4xxError. A full list of available alarms can be found here.

For this blog, we will utilize a pre-configured alarm to monitor when one of the website services, backed by an Application Load Balancer, experiences abnormal response times. When the alarm triggers, CloudWatch Investigations automatically initiates an investigation, analyzing both the current alarm state and 90 days of CloudTrail event history to generate hypotheses and determine potential root causes. The investigation insights are published to a Slack channel via Amazon Q Developer in Chat Applications and Amazon Simple Notification Service (SNS).

Figure 1. Architecture diagram of the services involved in the investigation integration in Slack

Prerequisites

  1. Launch the Amazon CloudFormation template associated with the One Observability lab outlined in the AWS Samples GitHub.
  2. Set up a Standard Amazon SNS topic by following the instructions outlined here. To enable CloudWatch investigations to send notifications to Slack, you must add an access policy to the Amazon SNS topic, an example can be found here.
  3. When the topic configuration is complete, navigate to Amazon Q Developer in Chat Applications (formerly AWS Chatbot) to configure the integration between Amazon Q and Slack by following the instructions outlined here. To allow channel members to interact with the investigation in Slack, add the following permission templates to the Channel role settings: Notification Permissions, Amazon Q Permissions, and Amazon Q Operations assistant permissions. More details on these permissions can be located here.

Setting up CloudWatch Investigations

To get started, navigate to the Amazon CloudWatch console. Choose AI Operations and then Configuration.

Figure 2. Configure for this account button within the AWS Console

Before we can set up an investigation, we need to create an investigation group. This is an organizational structure to manage common properties of the investigation like retention requirements, encryption, access permissions and the SNS topic linked. Click Configure for this account and follow the prompts in the console to set up the investigation group. Detailed explanations for each prompt are located in the documentation here. For this demo, we left the default options for steps 1 and 2 of the prompts. In step 3, please select the existing SNS topic created in the prerequisites section.

Figure 3. Select SNS topic for Q Developer Operational Insights

For the investigation trigger, we will use an existing alarm created by the CloudFormation deployment mentioned at the beginning of this blog. The sample alarm is named:

ApplicationInsights/Services/AWS/ApplicationELB/TargetResponseTime/app/Servic-lista-... 

and it goes into ALARM state when one of the website services, backed by an Application Load Balancer, experiences abnormal response times.

To configure this alarm to automatically start an investigation when it goes into an ALARM state:

  1. In the CloudWatch console, choose Alarms, All alarms
  2. Search for the alarm name and click on it
  3. Choose Actions, Edit
  4. Choose Next once to skip the metrics and conditions section
  5. Choose Add investigation action and then select your investigation group as outlined in figure 4
  6. Choose Skip to Preview and create, then choose Update alarm

Figure 4. Configure alarm to automatically start investigations

Testing the solution

At this point, we are ready to test the solution. To simulate a website traffic overload and trigger the alarm, we are going to use Amazon ECS tasks deployed as part of the sample web application. Open up CloudShell and run the following command:

PETLISTADOPTIONS_CLUSTER=$(aws ecs list-clusters | jq '.clusterArns[]|select(contains("PetList"))' -r)

TRAFFICGENERATOR_SERVICE=$(aws ecs list-services --cluster $PETLISTADOPTIONS_CLUSTER | jq '.serviceArns[]|select(contains("trafficgenerator"))' -r)

aws ecs update-service --cluster $PETLISTADOPTIONS_CLUSTER --service $TRAFFICGENERATOR_SERVICE --desired-count 5

The command will launch 5 instances of the Amazon ECS traffic generator container task. Once the tasks are running (after about 5 minutes), the ALB will become overloaded with requests, forcing the alarm into ALARM state as shown below. You should also see a new investigation created.

Figure 5. CloudWatch Alarm in ALARM state

Interacting with the investigation via Slack

Once the alarm is triggered, an investigation is initiated. Since we associated the investigation with an Amazon SNS topic and subscribed our Slack client to it, we can see a message in our Slack channel from Amazon Q as seen in figure 6.

Figure 6. Slack notification for open investigation

Within Slack, channel members can accept useful hypotheses and discard unhelpful ones by clicking on the Accept or Discard button. They can also add text-based notes of observations or evidence to the investigation by clicking on the Add Note button. Amazon Q will respond to messages within the same thread as the original investigation message. Channel members will be able to track who has accepted or discarded messages, as well as notes made about the investigation. This emphasizes the power of Slack integration, as teams can collaborate on the investigation and track who is actively working on it. It is important to note that CloudWatch Investigations uses Generative AI and may provide suggestions different from those below based on your specific account environment.

Figure 7. Accept or discard investigation suggestions from Slack

When integrated with Slack, CloudWatch Investigations can provide suggestions and root-cause hypotheses. Channel members with appropriate permissions can access metrics, charts, and additional information related to the investigation by clicking the blue header at the top of the investigation message. This link will direct users to the CloudWatch Investigations feed in the AWS console as shown below in figure 8.

Figure 8. CloudWatch Investigations in CloudWatch console.

Integrating CloudWatch Investigations with Slack or Teams channels improves developers’ visibility of arising issues and provides targeted recommendations to reduce alarm resolution time. The Accept and Discard buttons make it straightforward to track who is actively working on an investigation, fostering a culture of collaboration. The best part? The integration is quick to set up, especially with existing alarms.

Clean Up

If you launched the CloudFormation template mentioned at the beginning of this blog, the services will continue to run unless you delete them. To make sure that you are not charged for use of the resources after the demo, please follow the below steps to delete the resources created as part of the steps performed on this blog.

  1. Remove the Amazon Q in Chat Applications Slack integration by clicking on Remove Workspace Integration and policy as explained here.
  2. Delete Amazon SNS topic and subscription as explained here.
  3. Remove the CloudWatch Investigations as explained here.
  4. Delete the images under the Amazon ECR repository named cdk-…-container-assets… as explained here.
  5. Open the CloudShell console or AWS CLI and execute the two commands below:
curl https://raw.githubusercontent.com/aws-samples/one-observability-demo/main/PetAdoptions/cdk/pet_stack/resources/destroy_stack.sh | bash

aws cloudformation delete-stack –stack-name CDKToolkit

After executing the above command, the resources of the demo should be destroyed. Look at the CloudFormation console in case of potential errors.

Conclusion

The new CloudWatch Investigations feature reduces alarm resolution time for development teams by providing actionable insights and recommendations. It is straightforward to connect investigations to a team’s primary form of communication, such as Teams or Slack, to improve notification awareness and interaction. To learn more about the capabilities of CloudWatch Investigations check out the feature announcement and documentation.

Happy investigating!

AI & Ransomware: Inside the Exfiltration Playbook

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-ransomware-inside-the-exfiltration-playbook/

A decorative image show icons related to security and ransomware.

Ransomware used to mean locked files and paralyzed systems. But today, bad actors are just as focused on exfiltration—the silent theft of sensitive data—and using that data as leverage for extortion.

According to cybersecurity firm BlackFog, 94% of successful cyberattacks in 2024 involved data exfiltration, either alongside or instead of encryption. Whether it’s stolen patient records, credentials, or source code, the goal is simple: Extract something valuable and threaten to leak it if demands aren’t met.

In this article, we examine how exfiltration became a leading tactic, the trends driving its rise, and what organizations—and cloud storage providers—can do to defend against it.

What is exfiltration?

In cybersecurity, exfiltration refers to the unauthorized transfer of data from a system—often done stealthily, and almost always with malicious intent. Think of it as the digital equivalent of corporate espionage: Data is copied, compressed, and quietly smuggled out. Unlike ransomware encryption, which slams the door in your face, exfiltration leaves the front door looking untouched.

The data being exfiltrated is rarely random. Cybercriminals are increasingly strategic about what they take and why. Common targets include:

  • User credentials
  • Personally identifiable information (PII)
  • Intellectual property and source code
  • Encryption keys
  • Shadow copies or backup snapshots

Tactics include exploiting cloud storage misconfigurations, hijacking legitimate credentials, or disguising traffic as everyday protocols like DNS or HTTPS. Increasingly, data exfiltration happens before the main event—laying the groundwork for extortion, credential stuffing, or resale on underground markets.

Recent cybersecurity trends related to exfiltration

Exfiltration has become the defining feature of modern cyberattacks, and the evidence is growing:

  • Double extortion is now standard. Threat actors exfiltrate data first, then deploy ransomware—or skip the encryption altogether—to maximize leverage. According to the 2023 Unit 42 Report, 70% of ransomware incidents involved data theft.
  • Infostealers, malicious programs designed to covertly harvest sensitive information, are on the rise. Over 2.1 billion credentials were stolen in 2024 alone, with malware like RedLine and Lumma making theft accessible to low-skilled attackers. While cybersecurity task forces (comprised of both government and enterprise actors) have made the news with high-profile disruptions of Lumma and other tools, the ability to use generative AI coding tools has meant that cyber attackers have a shortened time to deployment for malware tools.
  • Time to exfiltration is shrinking. Fortinet’s 2025 Threat Landscape Report notes that attackers can extract data in under five hours, while defenders often take days to respond.
  • Encrypted traffic masks malicious behavior. Emerging exfiltration techniques like QUIC-Exfil use modern, encrypted protocols to evade detection by traditional firewalls.
  • State-sponsored actors prioritize stealth. Nation-state groups like Volt Typhoon have used long-term access to exfiltrate sensitive data undetected for months.

Together, these trends point to a world where stolen data is the main prize—and the threat doesn’t start when the ransom note arrives. It starts when your data quietly leaves the building.

Cloud misconfiguration and its role in exfiltration attacks

Exfiltration doesn’t always require malware—sometimes it only takes a misconfigured storage bucket or firewall rule. Cloud misconfigurations remain a leading cause of breaches, with public buckets, excessive identity and access management (IAM) privileges, and overly permissive network rules exposing data to the open internet.

Attackers exploit these gaps to quietly access or extract data without triggering alerts. A strong cloud posture management strategy—one that includes audit automation, implementing the principle of least privilege, and configuring features like Object Lock or Bucket Access Logs—is critical to reducing exposure.

Defending against exfiltration is a shared responsibility

As exfiltration becomes a primary threat, defense requires collaboration between cloud storage providers and their customers. Here’s how the most effective strategies work together.

Immutable backups and Object Lock

One of the strongest defenses is immutability. Backblaze B2’s Object Lock, for example, allows files to be written once and protected from modification, deletion, or encryption for a set period. Even if attackers compromise credentials, the data cannot be altered or removed.

Visibility and outlier detection

Cloud providers are investing in making advanced logging and behavioral analytics available to users to detect data theft in real time. Some examples of these types of features include:

  • Granular access logging with IP and user-level metadata.
  • Rate limiting and download caps to prevent mass theft.
  • Outlier detection powered by machine learning to catch subtle deviations from baseline activity.

Best practices for customers

Storage-layer defenses work best when paired with customer-side security controls:

  • Adopt zero trust architecture: Never assume implicit trust. Continuously validate users, devices, and behaviors.
  • Use MFA and least-privilege access: Lock down credentials, rotate them regularly, and minimize exposure.
  • Encrypt data at rest and in transit: Use strong encryption standards (AES-256, TLS 1.2+) and managed key systems.
  • Monitor for exfiltration indicators: Watch for abnormal traffic volumes, geographic anomalies, and unexpected protocol usage.
  • Run simulated breach drills: Test your team’s ability to detect and respond to stealthy data leaks.

Cloud storage companies can help provide critical security layers, but stopping exfiltration is ultimately a shared responsibility. Combining provider-level resilience with customer vigilance is the best path forward.

In a world of silent theft, vigilance is your best defense

Exfiltration isn’t just an add-on to ransomware. In this environment, locking the doors isn’t enough—You need to monitor the exits.

By combining immutable backups, smart logging, credential controls, and proactive monitoring, organizations can shift from passive victims to active defenders. The best defenses today aren’t just about blocking access; they’re about knowing what’s leaving and making sure it can’t be used against you.

The post AI & Ransomware: Inside the Exfiltration Playbook appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI in the Open Cloud: Optimizing Storage for AI/ML Workloads

Post Syndicated from David Johnson original https://www.backblaze.com/blog/ai-in-the-open-cloud-optimizing-storage-for-ai-ml-workloads/

A decorative image showing a cloud and data graphs.

In a recent survey, a staggering 82% of IT leaders reported experiencing performance issues with their AI workloads within the past year, primarily due to bandwidth and data processing limitations. At the same time, 93% agreed that there’s a greater expectation within their organizations for IT leaders to minimize time-to-revenue for their AI-driven IT infrastructure.

These statistics highlight the predicament that most AI infrastructure and operations teams face today: the challenge of balancing scalability with performance while staying on budget with two of their most expensive operational expense (OpEx) line item costs. Organizations are looking for their AI initiatives to pay off, while IT teams struggle to overcome the unique data challenges they face across the AI model/workload lifecycle—including scalability, performance, and cost management.

Ebook: “Why Object Storage Is Ideal for AI Workflows”

Want to take a deeper dive into the world of object storage? Check out our latest ebook, “Why Object Storage is Ideal for AI Workloads,” and discover the advantages this architecture has to offer across the model lifecycle.

Get the Ebook

Choosing the Right Cloud-Based Object Storage Provider for AI Data: There’s A Lot to Consider

Choosing the right object storage provider is one of the most consequential decisions infrastructure teams make when building AI‑powered applications. A mis-step can introduce hidden costs, brittle performance, and operational friction that put the brakes on time‑to‑insight and undermine ROI. Selecting or transitioning between cloud-based object storage providers demands careful consideration, as capabilities can vary significantly. 

To ensure your AI infrastructure is robust and cost-effective, thoroughly evaluate providers based on several critical factors:

Low latency & high throughput

Performance is critical when selecting a cloud-based object storage provider for AI data. Low latency and high throughput in particular are key as they ensure rapid data access and processing. Low latency minimizes delays in distributing data to GPU clusters, dramatically enhancing training and inference efficiency. Meanwhile, high throughput prevents bottlenecks and improves overall system performance when working with the massive datasets typical of AI applications. 

Reliability & uptime

Reliability is foundational. Even minor downtime can severely impact productivity, halt critical AI processes, and delay strategic objectives. Providers must offer clear service level agreements (SLAs) ensuring high availability, typically at 99.9% uptime or higher. Redundant architectures, data replication across regions, and reliable backup strategies are essential to maintain continuous and uninterrupted data access. Finally, when selecting a cloud-based object storage solution, data durability is table stakes.

Transparent & predictable pricing

Budget predictability is crucial for infrastructure planning and growth forecasting. Complex pricing structures, minimum retention periods, hidden fees for data transfers (egress), API requests, and retrieval charges can quickly erode cost-effectiveness. Providers should offer clear, simple pricing structures with explicit, predictable costs for all services involved. Ideally, charges for common activities such as data retrieval, ingress, and transactions should be minimized or eliminated to facilitate efficient AI workflows without unexpected budget impacts.

Data accessibility

Rapid, consistent data accessibility is non-negotiable for AI applications, especially during model training and inference, where delays can significantly degrade performance and outcomes. Providers offering “cold” storage tiers may appear economical upfront but introduce retrieval latency that could hamper time-sensitive applications. Opting for “hot” or always-on storage tiers ensures data remains immediately accessible without incurring delays, essential for high-performance AI workloads. Data portability is another important consideration for AI workloads, as the ability to freely transfer data to the GPU cloud (or clouds) of your choosing greatly increases flexibility and reduces the risk of lock-in.

Scalability and elasticity

AI initiatives typically experience fluctuating data storage demands, requiring infrastructure that can seamlessly scale with growth. Effective providers offer a scalable storage model capable of handling rapid expansions in data volume without performance degradation or significant architectural changes. Elastic scalability ensures that infrastructure teams can effortlessly manage peaks in data collection, processing, and model training demands.

Security and compliance

Security considerations cannot be overstated, particularly when dealing with sensitive or regulated data. Providers must demonstrate rigorous security standards, including data encryption (at rest and in transit), comprehensive access controls, detailed audit logs, and certifications such as SOC 2 compliance. These measures collectively ensure data integrity, protect against breaches, and ensure compliance with regulatory standards.

Leveraging the open cloud: Making data storage a critical part of your AI workflows

The open cloud is a cloud architecture and philosophy rooted in interoperability, data portability, and freedom from vendor lock-in. Unlike proprietary cloud ecosystems that tether customers to a single provider’s toolsets, APIs, and infrastructure, the open cloud is designed to enable seamless integration across platforms, tools, and environments. It supports open standards and APIs, gives users full control over their data, and allows organizations to choose best-in-class services without being locked into a single ecosystem.

In practical terms, the open cloud supports flexible data movement across public clouds, private clouds, and on-prem environments. It gives organizations the autonomy to mix and match services (e.g., compute from one provider, storage from another) and shift workloads as business or technical needs evolve—without punitive costs or excessive reconfiguration.

As organizations accelerate AI adoption, the open cloud offers clear, strategic advantages across every phase of the AI lifecycle—from data ingestion and preprocessing to training, tuning, and inference.

How Backblaze can help

The Backblaze B2 Cloud Storage platform facilitates smooth integration across various AI tools and platforms, and with Backblaze B2 Overdrive, you get a product designed to move exabyte-scale datasets at up to terabit speeds without the eye-watering price tag. 

  • S3 compatibility: Backblaze’s S3-compatible API ensures easy integration with existing applications and frameworks like TensorFlow and PyTorch. 
  • GPU compute environments: Backblaze partners with GPU providers like Vultr and PureNodal, enabling efficient data processing for training models on high-performance hardware without egress fees. 
  • MLOps platforms: Its compatibility with MLOps workflows allows users to streamline model lifecycle management while leveraging Backblaze’s reliable storage backbone. Together, these integrations simplify the AI deployment process and ensure maximum flexibility across cloud environments.

What makes B2 Overdrive different?

B2 Overdrive gets you all the above, plus it offers a specialized solution at a fraction of competitors’ costs. Here’s what you get:

  • Up to 1Tbps throughput: In other words, the kind of speed that lets you move petabytes of data fast without complex architecture. 
  • Unlimited free egress: Move as much data as you want, whenever you want, to wherever you want. Egress is totally free. 
  • Private networking support: Transfer data at maximum speed through secure private networking connections to your infrastructure.

It’s built on the foundation of our always-hot cloud storage infrastructure, with no minimum file size requirements, no deletion fees, and powerful features like Event Notifications so you can build responsive and automated workflows. We’ll be sharing some of the innovations under the hood in the coming months—so, stay tuned to our series on the engineering behind performance.

The post AI in the Open Cloud: Optimizing Storage for AI/ML Workloads appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Where and Why Object Storage Excels Throughout the AI Model Lifecycle

Post Syndicated from David Johnson original https://www.backblaze.com/blog/where-and-why-object-storage-excels-throughout-the-ai-model-lifecycle/

A decorative image showing a multi-paned screen backed up by a cloud.

No single technology has changed the way we use data quite like AI. From massive training sets to constant streams of checkpoint and inference data, AI applications are data intensive, to say the least.
Thankfully, there’s an answer. Object storage—with its scalability, flexibility, and cost-effectiveness—is uniquely suited to AI at every stage of the model lifecycle.

In this blog post, we’ll take a quick look at what object storage is, why it’s a perfect fit for AI workloads, and how Backblaze B2 Cloud Storage offers unique advantages for AI teams looking to innovate quickly, easily, and cost-effectively.

What is object storage?

Think of object storage as a giant, organized bucket for all your files. Instead of stuffing things into folders or breaking them into blocks, you just drop each file (an “object”), with a unique tag and some helpful notes (metadata), into your storage solution.

Unlike traditional file or block storage, object storage uses a flat address space. Each object is assigned a unique identifier and can be tagged with rich metadata, making it easy to search, retrieve, and manage at scale.

Because of this unique architecture, object storage is ideal for handling unstructured data—such as images, video, audio, text, and sensor data—which is the meat and potatoes of most modern AI workflows. Also, being cloud-based, object storage is inherently designed for massive scalability and accessibility over the internet (often via S3 API).

Ebook: “Why Object Storage Is Ideal for AI Workflows”

Want to take a deeper dive into the world of object storage? Check out our latest ebook, “Why Object Storage is Ideal for AI Workloads,” and discover the advantages this architecture has to offer across the model lifecycle.

Get the Ebook

Understanding AI’s data storage needs at each stage of the model lifecycle

Before diving into the benefits of object storage, let’s first define and outline the AI model lifecycle. While some may slice and dice it a little differently, generally speaking, we can break the AI model lifecycle down into the following stages:

  • Data ingestion and collection: Massive, often petabyte-scale datasets are gathered from a diversity of sources.
  • Data preparation and storage: Raw data is cleaned, labeled, transformed, and stored for future retrieval and processing.
  • Model training: Data is fed into AI training algorithms, typically deployed across many nodes in a GPU cluster—usually requiring high throughput, parallel access, and lengthy processing times.
  • Deployment and inference: Trained models are deployed into live applications where they take in new data and make inferences based on that data.
  • Monitoring and archiving: Continuous monitoring generates substantial amounts of log data and performance metrics that must be versioned, stored, and archived for compliance or retraining purposes.

As you can see, each stage of the model lifecycle presents its own unique set of data demands—with each one requiring plenty of planning, work, and preparation. And at every one of these stages, matters of scale, speed, accessibility, and cost are mission-critical to a project’s success. 

Where object storage excels: Scalability for data ingestion and collection

Object storage offers virtually unlimited scalability for large, and ever-expanding datasets, making it an ideal solution for the earliest stages of AI development. With no need to create volumes or file systems, organizations can quickly start uploading data to object storage. In addition to this seamless scalability, object storage also shines in its ability to support a diverse range of structured and unstructured data types without the need for rigid hierarchies. In this way, AI teams can ingest all sorts of data to support whatever their unique application needs; and do it quickly and efficiently.

Flexible data preparation and storage

Cloud-based object storage systems are excellent for maintaining easily-accessible, version-controlled datasets that allow for lightning fast iteration and collaboration. Capabilities like version recovery (which allows teams to easily revert datasets to previous states with simple API calls) and concurrent access (which gives multiple team members the ability to work on the same datasets simultaneously without conflicts) are also key to the data preparation and storage phase of AI development.

Reliable, high-performance data storage for model training

For the model training stage of the AI lifecycle, object storage supports parallel access and high throughput, both of which are absolutely essential for GPU-intensive training workloads. Reliable shuttling of large datasets to GPU clusters, wherever they may be, is key for keeping things efficient. Meanwhile, streamlined storage of model checkpoints from those clusters gives teams peace of mind in knowing that a mid-training failure state will not place them all the way back at square one.

Plus, lifecycle management features allow completed or outdated training datasets to be automatically archived—reducing clutter and optimizing storage costs, all while keeping active training data easily accessible.

Efficient versioning for deployment and inference

AI models are always a work in progress. Once deployed and operational, they have to be routinely evaluated and tuned. To that end, object storage makes it easy to store and retrieve a range of valuable information, including model checkpoints, test results, and inference data.

Built-in versioning and object immutability features support reproducibility and audit trails, so you can always trace which data and models produced which results. Together, these capabilities make for robust and effective lifecycle management, significantly boosting reliability and compliance.

Cost-effectiveness and durability for monitoring and archiving

When in the field, continuous monitoring of AI models generates a whole lot of log data and performance metrics. Object storage automates the management of these resources through customizable lifecycle rules, automatically deleting or archiving out-of-date inference logs based on predefined timelines (e.g., after 30–180 days).

This significantly reduces the need for manual oversight, conserves engineering resources, and ensures that relevant performance data remains accessible for compliance and regulatory auditing.

Meanwhile, with the right vendor, object storage solutions can offer competitive pricing models—sometimes including the separation of compute from storage—to ensure cost-effectiveness throughout the late stages of the AI lifecycle. Finally, high durability (of 11 nines or more) and redundancies protect models and datasets which become increasingly valuable over time.

Backblaze B2: Cost-effective, high-performance object storage for your AI workloads

Backblaze B2 Cloud Storage takes all the inherent advantages of cloud-based object storage for AI workloads and amplifies them—through competitive, transparent pricing; reliable, high performance; and seamless integration and support to ensure your project is not only efficient and affordable, but most importantly, successful

  • Competitive, transparent pricing: One-fifth of the cost of most hyperscalers’ solutions, with no hidden costs and three times your total storage volume in free egress included. Plus, fully-transparent, predictable pricing models ensure your organization is fully aware and prepared for the costs associated with your applications. 
  • High performance and reliability: Upload speeds up to 30% faster than AWS S3 for many workloads, plus a 99.9% uptime SLA with 11 nines of durability, ensure always-hot, instantly accessible data for demanding AI workloads. 
  • Seamless adoption and integration, accompanied by expert support: With features like Universal Data Migration and no hidden delete fees, B2 Cloud Storage uniquely streamlines cost-effective data management for AI. Backblaze B2 also boasts S3 API compatibility for true plug-and-play functionality with leading AI and machine learning ops (MLOps) tools and technologies.

Plus, our truly agnostic solution allows organizations to freely and easily connect to any compute or GPU environment (or environments), free of vendor lock-in and fees. And in case you want some support along the way, our team of dedicated solution engineers are available to tailor and fine-tune your architecture and operations to best suit whatever the unique needs of your AI project may be. 

Optimize your AI lifecycle with cloud object storage from Backblaze B2

Data is one of the most important, and most challenging, aspects of AI development. And with their unprecedented data demands, traditional block and file storage systems frequently come up short in supporting modern AI applications. At the same time, legacy cloud storage solutions come with enormous burdens of cost, inflexibility, and the ever-looming threat of lock-in.

Cloud-based object storage offers the perfect solution to all these challenges—with the right mixture of performance, efficiency, and cost-effectiveness that AI projects need.It’s so-well-suited, in fact, that we’ve written an entire white paper on the subject! So, if you’re interested in taking a deeper dive into the topic, check out our ebook, Why Object Storage is Ideal for AI Workloads, today.

The post Where and Why Object Storage Excels Throughout the AI Model Lifecycle appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

5 Cloud Storage Best Practices for AI Workloads

Post Syndicated from David Johnson original https://www.backblaze.com/blog/5-cloud-storage-best-practices-for-ai-workloads/

A decorative image showing various technology icons surrounding a globe.

As organizations race to innovate in AI, efficient, scalable, and cost-effective cloud storage has become key to their success. Whether you’re training massive models or deploying real-time inference pipelines, following best practices for AI storage will help you maximize performance, minimize costs, and ensure the integrity and availability of your most valuable AI asset—data.

In this blog, we’re going to take a look at five of those best practices, to help you get the most  out of your cloud storage solution when working with AI.

Ebook: “Why Object Storage Is Ideal for AI Workflows”

Wondering what type of data architecture makes the most sense for your AI initiatives? Check out our latest ebook, “Why Object Storage is Ideal for AI Workloads,” and learn all the advantages this approach to cloud storage offers across the entire model lifecycle.

Get the Ebook

1. Understand Your Data Lifecycle

You’ve assembled your training data set, loaded it into fast storage next to your GPU compute, and hit the button to start your training. What happens when the training run is complete? If you’re just going to delete that data set, then great—enter rm -r and simply move on.

If not, though, you’ll need to carefully consider the ongoing costs of storage. Leaving that dataset where it is will likely cost you many times over what you’d spend archiving it to a more cost-effective location. By fully understanding and mapping your data lifecycle—and distinguishing between active (e.g., during model training), and inactive data (e.g., archived/dated model versions)—you can manage your storage costs much more efficiently.

2. Check In Your Checkpoints

Training AI models is a delicate, resource-intensive process. Hardware failures, software bugs, and even power outages can derail week-long training runs, wasting precious time and compute resources. 

The two most important steps you can take to avoid these kinds of snags are: 

  • Frequent checkpointing: This means regularly saving a model’s state so you can pick the training process back up from the last checkpoint, rather than starting all over again at square one. 
  • Backup checkpoint data to the cloud: Storing checkpoint data on only local drives alone can be very risky. If the local storage fails, your checkpoints—and all the progress they represent—could be lost. That’s why you should always back up checkpoint data to secure cloud storage solutions as well. This dual approach ensures both speed (for quick recovery) and durability (for disaster recovery), letting you and your team rest easy knowing your hard work is being protected.

3. Keep Your Model Safe

With that same spirit in mind, don’t forget that your models require safekeeping, too. It takes a lot of time and money to train AI models, so protecting them—whether from hardware failure, human error, ransomware attack, or other threats—is absolutely paramount. To safeguard your models:

  • Use your cloud provider’s object lock to prevent accidental or malicious deletion.
  • Implement regular, automated backups of both model binaries and associated metadata.
  • Store critical models in geographically redundant locations for disaster recovery.

These few simple steps can go a very long way to ensuring that your valuable, hard-earned models remain safe and functional, even when things take a turn for the worse.

4. Don’t Lock Your Data Behind a Paywall

Let’s imagine you’re planning your next training run. When looking at cloud providers, you discover that you can realize significant savings by switching GPU compute providers. The only problem is, your current provider will charge you an arm and a leg to move the data to where it needs to be. There’s still a net gain from moving, but you lose significant margin by paying this exorbitant “exit toll,” known as an egress fee.  This is why, before committing to a storage provider, you should carefully review its pricing structures and fees, including the following:

  • Calculate the total cost of moving your data, not just storing it.
  • Consider multi-cloud strategies or providers that offer free or low-cost egress for AI workloads.

By understanding these costs upfront, you retain the flexibility to optimize your infrastructure as business needs evolve, and avoid the all-too-common trap of hidden fees.

5. Do the Mirroring Math: The Replication Equation

Let’s imagine you’ve found yourself a cost-effective storage option with a specialized cloud object storage provider. Even after finding the right solution with the right pricing structure and performance, there are considerations to be made. No matter how quickly you can download the data, if compute and data are in different locations there’s no escaping the fact that your GPUs might be spinning idle waiting for that data to arrive.

To avoid this predicament, break out your calculator and do the “mirroring math”:

  • Calculate the time and cost required to replicate (mirror) data to a location near your GPUs before training starts.
  • Weigh the benefits of lower storage costs against the potential delays and additional storage expenses during training.
  • For large or frequently accessed datasets, it may be worth pre-staging data in high-throughput storage close to your compute.

Ask yourself: Is it faster and/or cheaper to replicate the data upfront to be in close proximity to your GPUs, or does the time required to mirror the data and the additional storage cost during the training run outweigh the benefits? Intelligent data placement—balancing cost, performance, and proximity—ensures your AI workloads run efficiently and cost-effectively.

Building a Future-Proof AI Storage Strategy

The relentless pace of AI innovation demands a storage strategy that is agile, scalable, and cost-effective. Thankfully, the above five best practices can go quite a long way to ensuring the long-term success of your AI project

By understanding the entirety of your data lifecycle—checkpointing wisely, securing your models, avoiding data lock-in, and optimizing data placement—your team is laying the groundwork for sustained AI success. No matter what industry you’re in, these best practices will help to control costs, accelerate innovation, maintain compliance, and protect your team’s most valuable digital assets, in both the near and long term.

Ready to take a deeper dive dive into the topic of storage and AI? Check out our latest ebook, “Why Object Storage is Essential for AI Workloads.

The post 5 Cloud Storage Best Practices for AI Workloads appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

The Hidden Costs of AI: Why Your Cloud Bill is Exploding

Post Syndicated from David Johnson original https://www.backblaze.com/blog/the-hidden-costs-of-ai-why-your-cloud-bill-is-exploding/

A decorative image showing buildings of many sizes.

AI workloads don’t play by the same rules as your average enterprise app, and if you’ve looked at your cloud bill lately, you probably know that already. They have unique demands that make them especially vulnerable to hidden AI storage costs. Think: massive parallel GPU training, nonstop data shuffling, and frequent checkpointing.

The problem? Most cloud pricing models weren’t built for this kind of action. They were designed when workloads were a lot more predictable. So, when you run AI workloads on storage models built by hyperscalers, the costs add up quickly, and often invisibly. 

Download the ebook

Struggling to keep AI storage costs under control? Download our free ebook to discover how to optimize cloud storage for AI workloads—without compromising performance.

Get the Ebook ➔ 

Here are five reasons your cloud bill for AI workloads could spiral out of control:

1. Death by API call: Soaring costs in AI training pipelines.

AI workloads are packed with transactions. Every ingest of raw data, training round, inference batch, or logging step triggers API calls—PUTs, GETs, LISTs, and COPYs. If you’re training a foundational model like Deepseek v3 or Llama 2, you could be making millions of small transactions a day just by uploading all the raw data you require for training.

Each transaction might cost a fraction of a cent—but they add up. 

Example: Let’s assume a model needs 1 trillion pretraining tokens. Different data sources contribute varying numbers of tokens per file. For this exercise, let’s assume the following token counts:

  • Web pages: ~1,000 tokens/page (e.g., blog posts, articles)
  • Books: ~100,000 tokens/book (avg. 300 page novel)
  • Code repositories: ~500 tokens/file (e.g., GitHub scripts)
  • News articles: ~800 tokens/article
  • Academic papers: ~5,000 tokens/paper

A typical large language model (LLM) training mix might look like this:

Source % of tokens Tokens contribution Files required (approx.)
Web pages 40% 400B tokens 400M files
Books 20% 200B tokens 2M files
Code 15% 150B tokens 300M files
News articles 15% 150B tokens 187.5M files
Academic papers 10% 100B tokens 20M files
Total 100% 1T tokens ~909.5M files

If you’re ingesting 909.5 million files to AWS S3 at $0.005 per 1,000 PUTs (pricing as of April 2025), then you’d be charged:

  • 909,500,000 ÷ 1,000 = 909,500 units
  • 909,500 × $0.005 = $4,547.50

That’s $4,547.50 in just PUT transaction fees—for just collecting all the data you need for training. And that’s not counting GETs, LISTs, or any other operations that are necessary to support the full AI data pipeline.

2. The small file tax: How small files drive up AI cloud storage costs

Models trained on image slices, text tokens, or time-series data can create millions of small files. These not only trigger excessive API calls, but also suffer from the following: 

  • Some providers bill you by minimum object size (e.g., rounding all small files up to 128KB).
  • Every small object can trigger a full-priced transaction.
  • Frequent access means you’re paying for reads, not just storage.

This mismatch means your dataset of 100 million 10KB files could behave (and cost) like a much larger, high-churn workload.

3. Why cold storage fails for AI data workloads

Deep archive tiers may be cheap upfront, but they’re a poor fit for iterative AI workflows. Need to rehydrate training data to rerun a model? Get ready to wait hours and pay per retrieval. Need to delete? You could get hit with minimum retention penalties, and pay for that data as if you held onto it for 60, 90, or even 180 days. 

AI workflows are iterative. You’re not archiving log files; you’re experimenting, fine-tuning, and reprocessing constantly. Cold storage is rarely compatible with that.

4. Egress fees: The hidden cost of moving AI training data

Egress is a silent killer. It’s the fee you pay every time you move data out of cloud storage. In AI workflows, that’s often necessary for:

  • Sending training data to a GPU cluster.
  • Validating models on a local system.
  • Migrating to another provider.
  • Collaborating with partners across clouds or regions.

These fees scale linearly with data volume, which is a problem when your AI pipeline is pulling terabytes or petabytes per day. 

5. AI data lifecycle rules can backfire

You might set up lifecycle rules to move infrequently accessed data to cheaper tiers—sounds smart, right?

Except:

  • Lifecycle transitions often come with per-object fees.
  • Accessing those objects later triggers retrieval fees, or breaks performance expectations.
  • Deleting or overwriting too early triggers penalties.

And all of this assumes you even know your data’s “temperature” in advance—which, in AI workflows, changes day to day.

Smarter AI Storage

Your AI pipeline isn’t just a compute problem: It’s a data movement and storage orchestration engine. And that’s exactly where traditional cloud pricing models fall short. 

If your cloud bill is blowing up, it’s probably not just because you kicked off another training run. It’s the millions of GET requests, the silent egress charges, and those archive tier retrievals you didn’t plan for.

The good news? Once you know where the hidden costs are, you can start building smarter.

The post The Hidden Costs of AI: Why Your Cloud Bill is Exploding appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Building a Conversational AI Chatbot Website with Backblaze B2 + LangChain

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/building-a-conversational-ai-chatbot-website-with-backblaze-b2-langchain/

A decorative image showing a cloud with the Backblaze logo with code imagery.

In an earlier blog post, I explained how to build your own LLM with Backblaze B2 + Jupyter Notebook, implementing a simple conversational AI chatbot using the LangChain AI framework to implement retrieval-augmented generation (RAG). The notebook walks you through the process of loading PDF files from a Backblaze B2 Bucket into a vector store, running a local instance of a large language model (LLM) and combining those to form a chatbot that can answer questions on its specialist subject.

That article generated a lot of interest, and a few questions:

  • “Could you make this into a web app, like ChatGPT?”
  • “Could you use this with OpenAI? DeepSeek?”
  • “Could I load multiple collections of documents into this?”
  • “Could I run multiple LLMs and compare them?”
  • “Can I add new documents to the vector store as they are uploaded to the bucket?”

The answer to all of these questions is “Yes!”

Today, I’ll present a simple conversational AI chatbot web app with a ChatGPT-style UI that you can easily configure to work with OpenAI, DeepSeek, or any of a range of other LLMs. In future blog posts, I’ll extend this to allow you to configure multiple LLMs and document collections, and integrate with Backblaze B2’s Event Notifications feature to load documents into the vector store within seconds of them being uploaded.

And, here’s a very short video of the chatbot in action:

Editorial note: A version of this article was previously published on the New Stack.

RAG basics

Retrieval-augmented generation, or RAG for short, is a technique that applies the generative features of an LLM to a collection of documents, resulting in a chatbot that can effectively answer questions based on the content of those documents.

A typical RAG implementation splits each document in the collection into a number of roughly equal-sized, overlapping chunks, and generates an embedding for each chunk. Embeddings are vectors (lists) of floating point numbers with hundreds or thousands of dimensions. The distance between two vectors indicates their similarity. Small distances indicate high similarity and large distances indicate low similarity.

The RAG app then loads each chunk, along with its embedding, into a vector store. The vector store is a special-purpose database that can perform a similarity search–given a piece of text, the vector store can retrieve chunks ranked by their similarity to the query text by comparing the embeddings.

Let’s put the pieces together:

Given a question from the user (1), the RAG app can query the vector store for chunks of text that are similar to the question (2). This will be the context that helps the LLM answer the user’s question. Here’s a concrete example using the Backblaze documentation collection: Given the question, “Tell me about object lock,” the vector store returns four document chunks, each of about 170 words, to the app (3). Here is a link to the text of, and a short extract from, each chunk:

  • Object Lock: With Object Lock Legal Hold, files are prevented from being changed or deleted, but the immutability does not have a defined retention period.
  • Object Lock: You can enable Object Lock on a bucket when you create a new bucket or on an existing bucket.
  • Object Lock: Object Lock in Backblaze B2 was designed with the Immutability feature in Veeam Backup & Replication in mind.
  • Enable Object Lock with the S3 Compatible API: To enable Object Lock on an existing S3 compatible bucket, add the x-amz-bucket-object-lock-token header with a value of 1 to the S3 Put Object Lock Configuration API call.

Unsurprisingly, three of the four results are extracts from the main documentation article on Object Lock.

In order to provide a conversational experience, the RAG app maintains a chat history in a message store. The RAG app retrieves the message history (4) to form part of the query it will send to the LLM.

A system prompt sets the guidelines for the LLM to generate responses. In a RAG, the system prompt typically specifies that the response should be based on retrieved document data rather than model memorization, and attempts to prevent hallucinations by ensuring that the model does not fabricate information if no relevant documents are found. Here is the system prompt from the RAG app:

Use the following pieces of context and the message history to answer the question at the end. If you don't know the answer,  just say that you don't know, don't try to make up an answer.

The RAG submits the system prompt, the context, the last few messages in the conversation, and the user’s question to the LLM (5), which generates a, hopefully useful, response (6). Here’s the answer it provided to the above question:

Object Lock is a feature that allows users to enforce immutability on files stored in cloud storage, specifically in services like Backblaze B2 and AWS S3. Here are the key points about Object Lock:

  1. Immutability: Once a file is locked, it cannot be modified or deleted until a specified retention period expires or a legal hold is lifted.
  2. Retention Periods: Users can set a retention period when locking a file. During this time, the file remains immutable, meaning it cannot be changed or deleted.
  3. Legal Hold: Users can place files under a legal hold, which prevents modifications or deletions indefinitely until the hold is removed.
  4. Enabling Object Lock: Object Lock can be enabled for new buckets or existing ones, but it must be turned on at the bucket level before files are uploaded.
  5. Integrations: Some applications, like Veeam Backup & Replication, offer built-in support for Object Lock, making the process easier for users when they want to ensure file immutability during backups.
  6. No Extra Cost: There is no additional fee for using Object Lock; however, standard storage charges for the locked files still apply.
  7. Mistakes in Locking: If a user locks a file for longer than intended, they may need to contact support to resolve the situation, which could involve closing their storage account in some cases.

Overall, Object Lock is a crucial feature for organizations that need to ensure data integrity and compliance with regulations concerning data retention.

The RAG app adds the user’s question and the LLM’s response to the message store (7), returns the answer to the user (8), and awaits the next question.

A quick tour of the sample app

The sample app is on GitHub at https://github.com/backblaze-b2-samples/ai-rag-app. The app is open source, under the MIT license, so you can use it as a basis for your own experimentation without any restrictions. The app was originally written to demonstrate RAG with Backblaze B2 Cloud Storage, but it works with any S3 compatible object store.  

The README file covers configuration and deployment in some detail; in this blog post, I’ll just give you a high-level overview. The sample app is written in Python using the Django web framework. API credentials and related settings are configured via environment variables, while the LLM and vector store are configured via Django’s settings.py file:

CHAT_MODEL: ModelSpec = {
'name': 'OpenAI',
'llm': {
'cls': ChatOpenAI,
'init_args': {
'model': "gpt-4o-mini",
}
},
}

# Change source_data_location and vector_store_location to match your environment
# search_k is the number of results to return when searching the vector store
DOCUMENT_COLLECTION: CollectionSpec = {
'name': 'Docs',
'source_data_location': 's3://blze-ev-ai-rag-app/pdfs',
'vector_store_location': 's3://blze-ev-ai-rag-app/vectordb/docs/openai',
'search_k': 4,
'embeddings': {
'cls': OpenAIEmbeddings,
'init_args': {
'model': "text-embedding-3-large",
},
},
}

The sample app is configured to use OpenAI GPT-4o mini, but the README explains how to use different online LLMs such as DeepSeek V3 or Google Gemini 2.0 Flash, or even a local LLM such as Meta Llama 3.1 via the Ollama framework. If you do run a local LLM, be sure to pick a model that fits your hardware. I tried running Meta’s Llama 3.3, which has 70 billion parameters (70B), on my MacBook Pro with the M1 Pro CPU. It took nearly three hours to answer a single question! Llama 3.1 8B was a much better fit, answering questions in less than 30 seconds.

Notice that the document collection is configured with the location of a vector store containing the Backblaze documentation as a sample dataset. The README file contains an application key with read-only access to the PDFs and vector store so you can try the application without having to load your own set of documents.

If you want to use your own document collection, a pair of custom commands allow you to load them from a Backblaze B2 Bucket into the vector store and then query the vector store to test that it all worked.

First, you need to load your data:

% python manage.py load_vector_store
Deleting existing LanceDB vector store at s3://blze-ev-ai-rag-app/vectordb/docs
Creating LanceDB vector store at s3://blze-ev-ai-rag-app/vectordb/docs
Loading data from s3://blze-ev-ai-rag-app/pdfs in pages of 1000 results
Successfully retrieved page 1 containing 618 result(s) from s3://blze-ev-ai-rag-app/pdfs
Skipping pdfs/.bzEmpty
Skipping pdfs/cloud_storage/.bzEmpty
Loading pdfs/cloud_storage/cloud-storage-about-backblaze-b2-cloud-storage.pdf
Loading pdfs/cloud_storage/cloud-storage-add-file-information-with-the-native-api.pdf
Loading pdfs/cloud_storage/cloud-storage-additional-resources.pdf
...
Loading pdfs/v1_api/s3-put-object.pdf
Loading pdfs/v1_api/s3-upload-part-copy.pdf
Loading pdfs/v1_api/s3-upload-part.pdf
Loaded batch of 614 document(s) from page
Split batch into 2758 chunks
[2025-02-28T01:26:11Z WARN lance_table::io::commit] Using unsafe commit handler. Concurrent writes may result in data loss. Consider providing a commit handler that prevents conflicting writes.
Added chunks to vector store
Added 614 document(s) containing 2758 chunks to vector store; skipped 4 result(s).
Created LanceDB vector store at s3://blze-ev-ai-rag-app/vectordb/docs. "vectorstore" table contains 2758 rows

Now you can verify that the data is stored by querying the vector store. Notice how the raw results from the vector store include an S3 URI identifying the source document:

% python manage.py search_vector_store 'Which B2 native APIs would I use to upload large files?' 
2025-03-01 02:38:07,740 ai_rag_app.management.commands.search INFO Opening vector store at s3://blze-ev-ai-rag-app/vectordb/docs/openai
2025-03-01 02:38:07,740 ai_rag_app.utils.vectorstore DEBUG Populating AWS environment variables from the b2 profile
Found 4 docs in 2.30 seconds
2025-03-01 02:38:11,074 ai_rag_app.management.commands.search INFO
page_content='Parts of a large file can be uploaded and copied in parallel, which can significantly reduce the time it takes to upload terabytes of data. Each part can be anywhere from 5 MB to 5 GB, and you can pick the size that is most convenient for your application. For best upload performance, Backblaze recommends that you use the recommendedPartSize parameter that is returned by the b2_authorize_account operation. To upload larger files and data sets, you can use the command-line interface (CLI), the Native API, or an integration, such as Cyberduck. Usage for Large Files Generally, large files are treated the same as small files. The costs for the API calls are the same. You are charged for storage for the parts that you uploaded or copied. Usage is counted from the time the part is stored. When you call the b2_finish_large_file' metadata={'source': 's3://blze-ev-ai-rag-app/pdfs/cloud_storage/cloud-storage-large-files.pdf'}
...

The core of the sample application is the RAG class. There are several methods that create the basic components of the RAG, but here we’ll look at how the _create_chain() method brings together the system prompt, vector store, message history, and LLM.

First, we define the system prompt, which includes a placeholder for the context—those chunks of text that the RAG will retrieve from the vector store:

# These are the basic instructions for the LLM
system_prompt = (
"Use the following pieces of context and the message history to "
"answer the question at the end. If you don't know the answer, "
"just say that you don't know, don't try to make up an answer. "
"\n\n"
"Context: {context}"
)

Then we create a prompt template that brings together the system prompt, message history, and the user’s question:

# The prompt template brings together the system prompt, context, message history and the user's question
prompt_template = ChatPromptTemplate(
[
("system", system_prompt),
MessagesPlaceholder(variable_name="history", optional=True, n_messages=10),
("human", "{question}"),
]
)

Now we use LangChain Expression Language (LCEL) to bring the various components together to form a chain. LCEL allows us to define a chain of components declaratively; that is, we provide a high-level representation of the chain we want, rather than specifying how the components should fit together. 

Notice the log_data() helper method—it simply logs its input and passes it on to the next component in the chain.

# Create the basic chain
# When loglevel is set to DEBUG, log_input will log the results from the vector store
chain = (
{
"context": (
itemgetter("question")
| retriever
| log_data('Documents from vector store', pretty=True)
),
"question": itemgetter("question"),
"history": itemgetter("history"),
}
| prompt_template
| model
| log_data('Output from model', pretty=True)
)

Assigning a name to the chain allows us to add instrumentation when we invoke it:

# Give the chain a name so the handler can see it
named_chain: Runnable[Input, Output] = chain.with_config(run_name="my_chain")

Now, we use LangChain’s RunnableWithMessageHistory class to manage adding and retrieving messages from the message store:

# Add message history management
return RunnableWithMessageHistory(
named_chain,
lambda session_id: RAG._get_session_history(store, session_id),
input_messages_key="question",
history_messages_key="history",
)

Finally, the log_chain() function prints an ASCII representation of the chain to the debug log:

log_chain(history_chain, logging.DEBUG, {"configurable": {'session_id': 'dummy'}})

This is the output:

The RAG class’ invoke() function, in contrast, is very simple. Here is the key section of code:

response = self._chain.invoke(
{"question": question},
config={
"configurable": {
"session_id": session_key
},
"callbacks": [
ChainElapsedTime("my_chain")
]
},
)

The input to the chain is a Python dictionary containing the question, while the config argument configures the chain with the Django session key and a callback that annotates the chain output with its execution time. Since the chain output contains Markdown formatting, the API endpoint that handles requests from the front end uses the open source markdown-it library to render the output to HTML for display.

The remainder of the code is mostly concerned with rendering the web UI. One interesting facet is that the Django view, responsible for rendering the UI as the page loads, uses the RAG’s message store to render the conversation, so if you reload the page, you don’t lose your context.

Take this code and run it!

The sample AI RAG application is open source under the MIT license, and I encourage you to use it as the basis for your own RAG exploration. The README file suggests a few ways you could extend it, and I also draw your attention to conclusion of the README if you are thinking of running the app in production:

[…] in order to get you started quickly, we streamlined the application in several ways. There are a few areas to attend to if you wish to run this app in a production setting:

Above all, have fun! AI is a rapidly evolving technology, with vendors and open source projects releasing new capabilities every day. I hope you find this app a useful way of jumping in.

The post Building a Conversational AI Chatbot Website with Backblaze B2 + LangChain appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

AI 101: How AI and Ransomware Are Reshaping Cybersecurity

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-how-ai-and-ransomware-are-reshaping-cybersecurity/

A decorative image showing a shield, a chip, and the words "AI" over the chip.

AI is rewriting the rules of technology, for better or worse. Arguably one of the most “for better and worse” areas? Ransomware. It’s a full blown billion dollar business, and AI is supercharging both the offense and defense.  

Not only are we seeing AI give bad actors more sophisticated tools and campaigns to target business and consumers alike, we’re also seeing mitigation techniques and technologies deployed by good actors gain equally compelling AI-powered improvements. 

In other words, welcome to the future—where your data is the hostage and the bots are negotiating. Let’s dig in.

Some stage-setting: How much is ransomware costing us?

Despite ransomware payments exceeding an eye-watering $1 billion in 2023—and despite some high profile attacks in 2024, one of which extracted $75 million from a single victim—ransomware attacks actually fell overall in 2024. High profile law enforcement activity, like those against LockBit and BlackCat contributed to a huge drop in the second half of 2024. 

Don’t get too excited though: According to cryptocurrency tracing firm Chainanalysis, that still meant $814 million in 2024. And, the true cost of ransomware includes more than just payments extracted under threat. 

The economic ripple effects of a ransomware attack can include losing C-level talent, having to lay off employees, and ongoing downtime or business closure. Industry-wide, cyber insurance is a growing industry, and 2024 saw a staggering 31% of claims come from third-party risk. 

Reports show that cyber attackers are using ransomware data in new ways, including targeting critical backups and using hostage data to damage organizational reputation

Perhaps most concerningly, ransomware attackers are increasingly using exfiltration as a tactic to double and triple extortion, even using exfiltration data to launch targeted distributed denial-of-service (DDoS) attacks. According to a Check Point’s 2025 Cyber Security Report, some new actors have emerged as exclusively “data-selling platforms,” hosting dedicated data leak sites (DLS) and negotiation platforms.

The good news

  • Machine learning (ML) tools have underpinned modern cyber security techniques for years now—with excellent results. 
  • Sophisticated monitoring tools give us far more granular insights and alerts. 
  • AI-driven behavioral analysis is making it easier to detect anomalies and preempt attacks before they escalate.

What does this mean for defending against ransomware attacks?

Enterprises now have access to security platforms that analyze network behavior in real time, flagging unusual access patterns or lateral movement before a full ransomware payload can deploy. These platforms rely on machine learning models trained on massive datasets of known attack vectors, which allows them to flag and quarantine suspicious activity with impressive accuracy.

The interesting thing is that common knowledge says that “the AI revolution” has been happening recently, and quickly. But, when it comes to cybersecurity defense, many tools have been using ML algorithms for at least two decades. Palo Alto Networks (WildFire), for example, has been using ML since 2003. 

The line between “processing massive datasets and acting up on that info based on programmed parameters” and machine learning is subtle, but important. While the former follows set parameters, machine learning identifies patterns in data—sometimes with human guidance—to decide from multiple possible actions. 

It’s like teaching an assistant a series of tasks they can eventually do on their own. When you think about the progression from basic automation to ML, AI, and deep learning, the shift from rule-based actions to autonomous, chained decisions starts to make a lot of sense.

Zero trust architecture, enhanced by AI, is also gaining momentum. Instead of relying on perimeter-based defenses, AI-enhanced systems enforce granular access controls and continuously verify user and device trust levels. In practice, what this means is that systems no longer assume that you are you on the other end—not without evidence. Combine this with real-time threat intelligence sharing and automated incident response, and enterprises can shorten the window between detection and mitigation drastically. 

The bad news

  • Deep fakes are more convincing. 
  • The ability to generate code means there are more attacks, and those attacks are more sophisticated and responsive. 
  • Cyber criminals of all skill levels have access to more technical tools, including some that are specialized in malware. 
  • Enterprises are adjusting to a new way of working, which can create vulnerabilities.

Generative AI, phishing, and deep fakes

The low-hanging fruit in this discussion is that it’s easy to use generative AI to create more convincing phishing attacks. In the past, bad grammar or non-localized language choices have been an easy way to quickly identify a phishing attack. 

Assisted by generative AI, deep fakes of both the voice and video flavor are getting increasingly difficult to spot—so, while you know your CEO isn’t likely to text you to get a bunch of gift cards or send them company funds via Bitcoin or PayPal, you might believe a video of your CFO or a call from your CEO asking you to transfer funds to accounts that turn out to not be legitimate. 

How is generated code being used by ransomware bad actors?

Just as generative AI models have made everyone a poet, they’re also widely used to generate code. Tools like GitHub Copilot have seen wide adoption amongst enterprises looking to generate and test code. Gartner reports that by 2027, 70% of professional developers will use AI-powered coding tools, up from less than 10% in 2023. 

Given how AI code generation has made code generation easier on enterprises, it’s no surprise that the ransomware industry is following the same adoption trends. By January 2023, this had gone from a hypothetical to a reality, with ransomware bad actors of low levels of technical skill able to leverage LLMs to create malware scripts. 

By July 2023, cybercriminals were already discussing WormGPT, a malicious chatbot trained on ChatGPT which removed standard guardrails against creating illegal or inappropriate content. And, cybersecurity protection firms had executed a proof of concept to demonstrate that AI could generate truly polymorphic code on the fly—a technique used to make it much easier to evade detection by antivirus programs. By July 2024, one study showed that ChatGPT 4 was able to exploit 87% of one-day vulnerabilities. 

Couple that with the fact that ransomware bad actors have opposite success metrics vs. enterprises. Cyber criminals rely on enacting as many attacks as possible, and it only takes one of those attacks succeeding to see a significant upside. Enterprises, on the other hand, only need one failure to see a huge negative impact on their businesses.

What things can you implement to be ransomware ready?

There are a variety of best practices enterprises and users can implement to be more ransomware ready. Organizations like National Institute of Standards and Technology (NIST) and Cybersecurity and Infrastructure Security Agency (CISA) typically publish recommendations, as well as security bulletins and trends within the industry. 

Some of these recommendations are things that users can do on every platform they interact with, such as:  

  • Creating good, strong, unique passwords, and preferably using a password manager: A good password manager reduces password reuse and helps ensure best practices are followed enterprise-wide. 
  • Enabling multifactor authentication (MFA): Multi-factor authentication remains one of the strongest lines of defense, especially when paired with device verification and biometric options. 

On the enterprise side of the house, frameworks like cyber resilience help teams protect data they’ve been entrusted with. And, AI-powered cyber security tools can be a powerful tool in any business’s toolbox. That can look like a number of different things, including: 

  • Investing in AI-powered endpoint detection and response (EDR). These tools continuously monitor and analyze endpoint activities, flagging unusual behavior and isolating threats automatically.
  • Training teams on recognizing deep fakes and AI-enhanced phishing attempts. Security awareness training is evolving fast. Focused, frequent, and AI-aware sessions are critical for employees across departments.
  • Leveraging deception technology. Deploying decoy systems, fake credentials, and honeypots can help trap attackers early and gather valuable intel on their tactics.
  • Running tabletop simulations. Practicing breach scenarios—especially those involving AI-enabled threats—prepares teams to act decisively when seconds matter.

Cyber resilience isn’t static, and neither are the tools and tactics. One of the most important areas an enterprise can invest in is ongoing security and research. Enterprise leaders need to prioritize proactive measures. That means ongoing AI model audits, being nimble in response to new and changing best practices, and investing in cross-functional teams that bring together infosec, legal, and operational leadership. 

The future of AI and ransomware

Let’s level with each other—separately, the AI and ransomware spaces are both changing quickly. When you combine AI and ransomware and try to define how they’re affecting each other, you’re on pretty slippery ground. 

What we’re trying to do here is identify patterns that affect our everyday lives—but we’re also taking a peek at what folks are studying in the research realm, because quantum is just around the corner, and, frankly, too impactful to ignore

So, tell us if we need an update, or if you have another opinion! The comments section is open and we’re happy to chat. 

The post AI 101: How AI and Ransomware Are Reshaping Cybersecurity appeared first on Backblaze Blog | Cloud Storage & Cloud Backup