All posts by Jeronimo De Leon

Building Multimodal AI Data Infrastructure with Pixeltable

2025-11-20 Jeronimo De Leon

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/building-multimodal-ai-data-infrastructure-with-pixeltable/

A decorative image showing a chip with the word 'AI' and digital lines extending into the background.

We’re approaching a fascinating inflection point in AI development. Research from Epoch AI indicates that high-quality text data will be fully exhausted by 2026 to 2028. As recently as January, OpenAI co-founder Ilya Sutskever said at a conference that all the useful data online had already been used to train models. Over 35% of top websites now block AI scrapers. OpenAI is cutting deals with publishers like The Financial Times because freely available training data is running out.

So what comes next? Multimodal data: video, images, audio, sensor readings. Data that captures how the physical world actually operates, not just how we describe it in text.

Nvidia CEO Jensen Huang highlighted this shift when discussing Tesla’s AI advantage. He noted that the company has a “phenomenal position” because Tesla is collecting massive amounts of real-world data through its AI-enabled factories and autonomous vehicles.”

This real-world data, what some call “world data,” is multimodal at its core. It includes video from cameras capturing spatial relationships and motion, sensor telemetry recording physical interactions, images showing object states, and audio capturing environmental context. Video is particularly valuable because it captures temporal dynamics, depth perception, and how objects interact over time, insights that static text or images alone cannot provide.

Here’s the insight most organizations miss: you’re already generating this data.

Your organization is already producing multimodal data

Every single day, your organization produces massive amounts of multimodal data, including:

Zoom calls with video, audio, and screen shares
Security camera footage
Customer service interactions combining chat logs, voice recordings, website screen recordings and product images
Manufacturing sensors producing telemetry alongside quality inspection photos
Marketing teams creating videos, graphics, and campaign documents
Sales demos mixing presentations, product screenshots, and recorded conversations

And that’s just the short list.

The problem isn’t scarcity. It’s how multimodal data gets siloed, deleted, or stored in ways that make it unusable for AI applications. Video sits in one system and transcripts in another, with metadata scattered across databases. Most organizations treat this as operational exhaust rather than the strategic asset it represents.

Organizations that start systematically leveraging their multimodal data today will have capabilities tomorrow that generic models can never match.

The challenge: Multimodal infrastructure complexity

Building AI systems that work across images, video, audio, and text traditionally requires stitching together a fragmented technology stack. Videos live in object storage. Structured data sits in relational databases. Vector embeddings need specialized vector databases. Custom ETL pipelines handle transformations. Orchestration code coordinates everything. You need separate systems for caching, versioning, and lineage tracking.

This “data plumbing” consumes more engineering time than actual AI development. A straightforward workflow like building a searchable video archive with object detection and similarity search requires coordinating five or more systems and writing hundreds of lines of orchestration code.

The complexity creates a barrier that prevents most organizations from leveraging their multimodal data effectively, even when the underlying AI models are accessible through APIs. That’s the gap that Pixeltable solves.

How Pixeltable simplifies multimodal data workloads

Pixeltable replaces the fragmented multi-system architecture typically required for AI applications with a single declarative table interface. Instead of coordinating databases, file storage, vector databases, APIs, and orchestration tools separately, you work with tables where multimodal data lives alongside your transformations and AI operations.

The approach is straightforward. Store multimodal data in tables, define transformations as computed columns, and query everything together. Pixeltable handles the orchestration, caching, and model execution automatically.

Connect to data in-place

Point Pixeltable at your existing object stores like AWS S3 or Backblaze B2 Cloud Storage without moving or duplicating data. Your files stay where they are, organized into queryable, versioned tables. No separate databases or vector stores needed.

Define workflows declaratively

Transformations, model inference, and custom logic become Python computed columns. Extract frames from video, run object detection, generate embeddings, define it once and Pixeltable auto-orchestrates execution, manages dependencies, and handles incremental updates when new data arrives.

Query across everything

Leverage semantic search co-located with metadata. Raw data and AI-generated results in one interface. Build RAG systems with auto-synced embedding indexes that eliminate separate vector database management.

Focus on logic, not infrastructure

Full versioning for reproducibility. Automatic incremental processing means only necessary computations run when data changes. The same code works in development and production without rewrites.

For a practical example, explore our companion Github notebook Multimodal Data Processing with Pixeltable and Backblaze B2. It demonstrates how to extract and transform video frames using Pixeltable, then store the processed results in Backblaze B2 Cloud Storage with automatic URL generation.

Powering multimodal AI with Pixeltable and Backblaze B2

At Backblaze, we understand how essential multimodal data has become for AI development. Our collaboration with Pixeltable integrates B2 Cloud Storage directly into their open-source framework, giving organizations a simple and scalable foundation for managing complex AI workloads.

Pixeltable’s declarative design works seamlessly with Backblaze B2 across the entire AI data lifecycle. Whether you are processing video for model training, running inference on image streams, or building retrieval-augmented generation systems with multimodal embeddings, Backblaze B2 provides reliable S3 compatible storage that Pixeltable can reference directly without data duplication.

We are working closely with the Pixeltable team on a handful of initiatives to make multimodal workflows easier to deploy and scale. For those exploring this integration, we provide an example that demonstrates how Pixeltable and Backblaze B2 work together across the multimodal AI pipeline.

The data that fuels multimodal AI already exists across most organizations, from meeting recordings to customer interactions, video archives, and sensor logs. With Pixeltable and Backblaze B2, the infrastructure to harness that data effectively is now within reach.

Explore Pixeltable on GitHub or visit pixeltable.com to learn about declarative multimodal data infrastructure. For S3 compatible storage across your AI pipeline, check out Backblaze B2.

The post Building Multimodal AI Data Infrastructure with Pixeltable appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Lessons from Vibe Coding Three Apps in Three Weeks

2025-09-23 Jeronimo De Leon

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/lessons-from-vibe-coding-three-apps-in-three-weeks/

A decorative images showing a gear, chips, and the word AI.

While taking some time for paternity leave in a small village in the middle of Bulgaria, I used my baby’s nap times to dive deeper into vibe coding to see just how fast and close these AI tools can get you to building real, production-ready apps. It led to a serious of articles, LinkedIn posts, and product experiments, all focused on understanding and sharing my insights on the state of programming and product design that leverage AI.

In my previous article, “ColabWithMe: A GPT Specialized in Google Colab for Data Analysis & ML,” I talked about how generative AI is redefining the programming landscape. As the Harvard Business Review noted in “We’re All Programmers Now,” this shift represents more than just enabling non-technical employees to code. The real opportunity lies in developing multi-skilled professionals who can operate across domains, compressing innovation cycles from weeks to days. (I explore this further in “The Shape of AI Training: How Skill Profiles Guide AI Learning Paths.“)

Which brings me to what I actually built during those nap times—three different applications using a variety of AI tools. Rather than focusing on polished user interfaces, I focused on backend functionality and core business logic. I discovered that debugging the frontend and getting it to look how I wanted consumed far more time than implementing core backend features. So, many of these vibe-coded apps work nicely on the backend, but need more polish on the frontend. Let’s dig in.

Tools reviewed

Vibe coding means building software by describing what you want in natural language and letting AI generate the code. I tested tools across three categories to see how they enable this new way of building.

Integrated development environment (IDE) integrated agents: GitHub Copilot Agent, Gemini Code Assistant, Claude, Cursor.
Conversational interfaces: ChatGPT, ChatGPT Codex, Grok, Claude.
Prompt app builders: Replit, Lovable, Bolt, GitHub Spark.

Project 1: TickGoals.com, AI-powered goal setting (Approximately 7 hours)

The first application tackled a common productivity challenge: transforming vague aspirations into actionable SMART goals. The system implements a conversational AI interface that guides users through goal refinement, then automatically generates structured milestones and tasks.

Key features:

Chat with AI to transform vague goals into structured SMART goals
Auto-generate actionable milestones and tasks based on your refined goals
To-do list interface for tracking progress and completion
Persistent goal storage with progress visualization

Tech stack:

React frontend for conversational UI generated by GitHub Spark
Firebase Functions for serverless backend processing
OpenAI API for goal and task creation
Firebase Firestore for persistent goal and task storage

Initially I prototyped across Lovable, Replit, Bolt and GitHub Spark to see what each would generate. I eventually used the code GitHub Spark generated for a cleaner React component structure. Check it out here: https://tickgoals.com

Project 2: NewsVibe.AI, newsletter aggregation and summarization platform (Approximately 12 hours)

While catching up on email, I noticed my inbox was filled with newsletters that I’d often just skim or summarize, so I built a tool to handle this automatically. The app provides users with personalized email addresses for newsletter subscriptions, then presents content in a newsfeed interface to easily scroll through with AI summarization.

Key features:

Personal @newsvibe.me email addresses for newsletter subscriptions.
Instagram-style scrollable feed displaying all your newsletters.
AI-powered summarization to get quick overviews of content.
Automatic extraction of links and key information from newsletters.
Subscription management dashboard with usage analytics.

Tech stack:

Cloudflare pages for frontend hosting.
Maileroo for email processing and parsing.
Supabase for user management and content storage.
Python backend deployed on Render for newsletter and summarization processing.
OpenAI API for content summarization.
Stripe integration for subscription management.

I split this project into separate frontend and backend repos, and found it blazing fast to build out all the backend functionality first before tackling the frontend.

Project 3: Welcome.AI, newsletter editor agent (Approximately 10 hours)

Welcome AI has been my side project since 2017, initially focused on competitive analysis of AI tools. I’ve rebuilt it multiple times, with the latest iteration using retrieval augmented generation (RAG) for content. But, content curation still required manual review, either by me or community contributors, so I built an agent to automate the entire process, identifying, categorizing, and synthesizing AI news into a publication-ready newsletter. View a generated newsletter here. Subscribe at https://newsletter.welcome.ai/

Key Features:

Automatically identifies and filters AI-related news from RSS feeds and newsletters
Categorizes stories by topic and summarizes key points
Writes complete newsletter copy with insights and summaries
Curates the top stories and case studies for featured content sections
Generates HTML formatting and generates a feature image for the top story

Tech Stack:

Python news feed processing
OpenAI Agent SDK and APIs
GitHub Actions for automated workflow execution
Supabase for content management and curation state
Backblaze B2 for generated feature image storage

This was purely a backend project to test and experiment with the OpenAI Agent SDK, though I diverged from it toward more direct large language model (LLM) tasks by the end.

Lessons learned

At a high level, you can definitely see how these tools are going to dramatically speed up development, especially for getting to minimum viable product (MVP) or prototype. You should only need a day or two to get something up and test market traction, especially with prompt app builders.

I found Claude Opus/Claude Code worked best for backend code within the IDE, while Gemini Pro was particularly good at frontend landing page development. Coding agents that make multiple changes across multiple files simultaneously, like those in Cursor, Copilot Agent, or ChatGPT Codex, still felt a bit daunting. I experienced chunks of code being deleted a few times, so I spent considerable time reviewing changes or reverting them.

Prompt app builders like Lovable, Replit, GitHub Spark, and Bolt can get you pretty far, but you can eventually hit a wall where the AI starts breaking more than it fixes, or you need to integrate third-party services that require direct code access. With one project, I started in a prompt builder then moved to an IDE for refinement.

High-level, here are some tips that should help in your vibe coding journey.

Before starting: Set instructions and rules

Like custom instructions in ChatGPT, each tool benefits from coding guidelines: Claude Code uses CLAUDE.md, Copilot uses configured instructions, and Cursor has rules (templates at https://cursor.directory/rules).

A screenshot of provided context for generative code tools.

Both Claude Code and Cursor support MCP (Model Context Protocol) for enhanced integrations (Cursor MCP directory: https://cursor.directory/mcp). Some tools can also index documentation folders for deeper context. Set these up first for better code generation.

Start with a complete product requirements document (PRD)

Before writing any code, spend time iterating with an LLM to generate a thorough PRD. This back-and-forth refinement process goes a long way in providing the context your AI coding tools need. Capture everything: user workflows, UI specifications, technical requirements, and success metrics. Save this in your README.md as your north star.

Prompt app builders like GitHub Spark generate PRDs first from your initial prompt, so the more complete and refined it is, the better.

Define your project structure upfront

Work with the LLM to create a structure that follows best practices but stays simple for what you’re building. An MVP doesn’t need enterprise architecture. Map out where components, services, and APIs belong, and include this in your initial prompt.

A screenshot of the project file structure for one of the vibe coding apps created by Jeronimo De Leon.

Monitor new file generation closely as AI tools can suggest new files when not needed. When this happens, correct it immediately. Keep the structure as simple as possible. Break up files that are doing too many things, as this makes them harder to read and update later.

Add context markers throughout your code

Include file paths and descriptions at the top of each file. This helps the AI maintain context when making changes. Add detailed logging at critical points to track what’s happening when things break. Watch for function renames, LLMs often change function names unnecessarily when updating code, breaking references elsewhere.

Always check current API documentation

LLMs can generate outdated code. OpenAI and Pinecone have changed their import syntax, but AI tools still produce the old versions. Have the LLM search for the latest docs, or check them yourself. Knowing how your services currently work helps you catch these mistakes immediately.

One feature, one conversation

Multitasking with AI means juggling code review while it generates more changes. Keep each conversation focused on a single feature unless features are directly related. When the LLM offers to optimize unrelated areas, decline. If the AI gets stuck repeating failed solutions, start fresh rather than fighting it.

Wisdom of the crowds

When stuck, get code reviews from other LLMs since they can catch different issues. But always review their output carefully. LLMs can duplicate functions across files or, worse, delete essential code. In Agent mode especially, I’ve seen them remove core functionality unrelated to the current task. Give specific instructions about where functions belong and double-check nothing critical disappeared.

Vibe Coding = Product Management + Engineering

The most significant shift with AI-assisted development isn’t the speed; it’s the role change. You’re no longer just implementing; you’re defining what to build, how it should work, and why it matters.

This is the multi-skilled professional evolution I mentioned earlier. When “We’re All Programmers Now,” it means domain experts can build their own solutions, but it also means programmers must become domain experts in product thinking. Success with vibe coding requires clear product vision to articulate requirements, technical knowledge to guide the AI correctly, and relentless focus on user problems.

You become the conductor orchestrating AI capabilities while maintaining the judgment to build what people actually need. The future belongs to these blended roles: product managers who understand engineering deeply enough to guide AI tools, and engineers who think like product managers. These T-shaped and M-shaped professionals operate fluidly across domains. This is how we compress innovation cycles from weeks to days: by eliminating the translation layer between idea and implementation.

The post Lessons from Vibe Coding Three Apps in Three Weeks appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Architecting Your AI Data Pipeline Using B2 Overdrive

2025-07-22 Jeronimo De Leon

Post Syndicated from Jeronimo De Leon original https://www.backblaze.com/blog/architecting-your-ai-data-pipeline-using-b2-overdrive/

A decorative image showing cloud storage and AI icons.

When you think about cloud infrastructure for AI, you immediately think of GPUs and other high-performance compute resources, and how your cloud architecture should be optimized to make the most of these expensive compute plans. But compute isn’t the only cloud product category you need to monitor to both scale your application and maintain a sustainable cloud infrastructure budget.

What ultimately fuels AI? Data—lots and lots of data. As part of a healthy AI pipeline, several versions of the same dataset need to be stored in a centralized repository, or multiple repositories if your strategy requires splitting data into cold vs. hot storage to reduce storage costs. For text-based LLMs, storage costs are minimal compared to compute resources. But as AI innovation increasingly relies on video and other media, both the base storage cost and data retrieval fees can make cloud bills spiral out of control.

In this blog, we’re taking a look at the AI data pipeline, where object storage sits in each stage, and how leveraging both Backblaze B2 and B2 Overdrive helps both increase performance and reduce costs for AI applications.

AI data pipeline stages

There are five key AI data pipeline stages where data retrieval and overall performance is critical—and this performance starts with your designated data storage backend.

Data ingest and active archive: Data is gathered from multiple designated sources (including APIs, internet of things (IoT) sensors, relational databases, etc.) and ingested into a centralized repository or multiple repositories.
Data processing: The raw data is transformed and enriched based on the model’s data parameters. This can range from relatively simple text cleanup to adding annotations and metadata. Feature engineering is performed to extract or construct meaningful attributes. All data is then converted into numerical representations (e.g., embeddings, vectors) suitable for model training and inference.
Model experimentation and training: Processed data is used to train models by learning underlying patterns. Iterative experiments in a test environment evaluate, tune, and improve model performance and accuracy.
Model deployment and inference: New data is prepared in the same way as during training and sent to the deployed model to generate predictions, support decision-making, and deliver personalized outputs.
Monitoring: Continuous monitoring tracks model performance, detects data drift, and flags potential bias, ensuring the model remains accurate and reliable over time.

Keep in mind that data ingestion and processing isn’t always sequential, such as when data is collected and ingested, but corruption is detected during processing. Ideally, your pipeline is configured with validation gates so that corrupt data is identified and handled before proceeding to downstream steps like testing, training, and production deployment.

When using cloud object storage as your data repository, one factor of selecting a plan (like cold versus hot storage) is the specific type of data ingestion that’s being utilized based on both the data source and AI model’s specific needs.

Batch ingestion is better suited for mid to lower performance storage, as this is typically used for historical datasets or a set schedule of pre-determined data updates, such as jobs pulling from relational databases or CSV uploads once a day or once per week.
Streaming ingestion is well-suited for hot storage to support a continuous stream of real-time (or near-real-time) data processing, such as from social media feeds and high-volume e-commerce AI helper agents.
Hybrid ingestion uses a combination of batch and streaming ingestion to handle both historical and real-time data requirements for AI models.

Where does cloud object storage sit in the AI data pipeline?

Everywhere. All scalable data pipelines lead to object storage.

Why? Data ingestion and active archive are the major areas where object storage fulfills an important purpose. When training AI models, especially in production, data scalability for multiple and diverse data types is a hard requirement. But object storage plays a key role in the other pipeline stages:

Data processing: Stores versioned outputs from data labeling, feature engineering, and cleaning processes.
Model experimentation and training: Provides high-throughput access to training datasets and stores model checkpoints.
Model deployment and inference: Stores serialized model artifacts with API-based retrieval for serving predictions at scale.
Monitoring: Stores synthetic outputs from generative models, logs, feedback, and performance metrics for analysis and reuse.

For both AI data performance and cost optimization, selecting an object storage product or tier is far from one-size-fits-all. You can strategically allocate your data to B2 Cloud Storage or B2 Overdrive, with your most essential model data stored in B2 Overdrive. Here’s a high-level diagram of what Backblaze B2 product to use for each stage, including examples of the data stored at each stage.

Learn more at Ai4 in August

Want to learn more? Backblaze is heading to Las Vegas for Ai4 August 11–13! In addition to booking a meeting to speak with our storage experts and stopping by our booth to pick up some swag, I’m excited to talk more about the AI data pipeline during my talk. If you’re attending Ai4, add The AI Pipeline Starts with Storage: Architecting Scalable Data Foundations to your conference agenda.

Can’t attend live in Vegas? Reach out to our Sales team to talk about your specific use case and how B2 Overdrive can help propel your data.

The post Architecting Your AI Data Pipeline Using B2 Overdrive appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Noise

All posts by Jeronimo De Leon

The collective thoughts of the interwebz