Tag Archives: AI

Algorithms Are Coming for Democracy—but It’s Not All Bad

2024-12-03 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/12/algorithms-are-coming-for-democracy-but-its-not-all-bad.html

In 2025, AI is poised to change every aspect of democratic politics—but it won’t necessarily be for the worse.

India’s prime minister, Narendra Modi, has used AI to translate his speeches for his multilingual electorate in real time, demonstrating how AI can help diverse democracies to be more inclusive. AI avatars were used by presidential candidates in South Korea in electioneering, enabling them to provide answers to thousands of voters’ questions simultaneously. We are also starting to see AI tools aid fundraising and get-out-the-vote efforts. AI techniques are starting to augment more traditional polling methods, helping campaigns get cheaper and faster data. And congressional candidates have started using AI robocallers to engage voters on issues. In 2025, these trends will continue. AI doesn’t need to be superior to human experts to augment the labor of an overworked canvasser, or to write ad copy similar to that of a junior campaign staffer or volunteer. Politics is competitive, and any technology that can bestow an advantage, or even just garner attention, will be used.

Most politics is local, and AI tools promise to make democracy more equitable. The typical candidate has few resources, so the choice may be between getting help from AI tools or getting no help at all. In 2024, a US presidential candidate with virtually zero name recognition, Jason Palmer, beat Joe Biden in a very small electorate, the American Samoan primary, by using AI-generated messaging and an online AI avatar.

At the national level, AI tools are more likely to make the already powerful even more powerful. Human + AI generally beats AI only: The more human talent you have, the more you can effectively make use of AI assistance. The richest campaigns will not put AIs in charge, but they will race to exploit AI where it can give them an advantage.

But while the promise of AI assistance will drive adoption, the risks are considerable. When computers get involved in any process, that process changes. Scalable automation, for example, can transform political advertising from one-size-fits-all into personalized demagoguing—candidates can tell each of us what they think we want to hear. Introducing new dependencies can also lead to brittleness: Exploiting gains from automation can mean dropping human oversight, and chaos results when critical computer systems go down.

Politics is adversarial. Any time AI is used by one candidate or party, it invites hacking by those associated with their opponents, perhaps to modify their behavior, eavesdrop on their output, or to simply shut them down. The kinds of disinformation weaponized by entities like Russia on social media will be increasingly targeted toward machines, too.

AI is different from traditional computer systems in that it tries to encode common sense and judgment that goes beyond simple rules; yet humans have no single ethical system, or even a single definition of fairness. We will see AI systems optimized for different parties and ideologies; for one faction not to trust the AIs of a rival faction; for everyone to have a healthy suspicion of corporate for-profit AI systems with hidden biases.

This is just the beginning of a trend that will spread through democracies around the world, and probably accelerate, for years to come. Everyone, especially AI skeptics and those concerned about its potential to exacerbate bias and discrimination, should recognize that AI is coming for every aspect of democracy. The transformations won’t come from the top down; they will come from the bottom up. Politicians and campaigns will start using AI tools when they are useful. So will lawyers, and political advocacy groups. Judges will use AI to help draft their decisions because it will save time. News organizations will use AI because it will justify budget cuts. Bureaucracies and regulators will add AI to their already algorithmic systems for determining all sorts of benefits and penalties.

Whether this results in a better democracy, or a more just world, remains to be seen. Keep watching how those in power uses these tools, and also how they empower the currently powerless. Those of us who are constituents of democracies should advocate tirelessly to ensure that we use AI systems to better democratize democracy, and not to further its worst tendencies.

This essay was written with Nathan E. Sanders, and originally appeared in Wired.

Google Cloud TPU v6e Trillium Shown at SC24

2024-11-30 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/google-cloud-tpu-v6e-trillium-shown-at-sc24/

At SC24, we saw the Google Cloud TPU v6e Trillium. This is Google’s custom AI accelerator with more performance, interconnect, and memory

The post Google Cloud TPU v6e Trillium Shown at SC24 appeared first on ServeTheHome.

Exploring the benefits of artificial intelligence while maintaining digital sovereignty

2024-11-29 Max Peterson

Post Syndicated from Max Peterson original https://aws.amazon.com/blogs/security/exploring-benefits-of-artificial-intelligence-while-maintaining-digital-sovereignty/

Around the world, organizations are evaluating and embracing artificial intelligence (AI) and machine learning (ML) to drive innovation and efficiency. From accelerating research and enhancing customer experiences to optimizing business processes, improving patient outcomes, and enriching public services, the transformative potential of AI is being realized across sectors. Although using emerging technologies helps drive positive outcomes, leaders worldwide must balance these benefits with the need to maintain security, compliance, and resilience. Many organizations, including those in the public sector and regulated industries, are investing in generative AI applications powered by large language models (LLMs) and other foundation models (FMs) because these applications can transform and scale their work and provide better experiences for customers. Beyond computing power, unlocking this AI potential resides in the AI applications that organizations can create based on a variety of AI/ML development services, models, and data sources. Organizations must navigate the complexity of building AI applications in light of existing and emerging regulatory regimes while verifying that their AI applications and related data are secure, protected, and resilient to risks and threats.

AWS offers a wide range of AI/ML services and capabilities, built on our sovereign-by-design foundation, that are making it simpler for our customers to meet their digital sovereignty needs while getting the security, control, compliance, and resilience that they need. For example, Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon SageMaker provides tools and infrastructure to build, train, and deploy ML models at scale while supporting responsible AI with governance controls and access to pretrained models.

Innovating securely across the AI lifecycle

Security is and always has been our top priority at AWS. AWS customers benefit from our ongoing investment in data centers, networks, custom hardware, and secure software services, built to satisfy the requirements of the most security-sensitive organizations, including the government, healthcare, and financial services. We have always believed that it is essential that customers have control over their data and its location. That’s why we architected the AWS Cloud to be secure and sovereign-by-design from day one. We remain committed to giving our customers more control and choice so that they can use the full power of AWS while meeting their unique digital sovereignty needs.

As organizations develop and implement generative AI, they want to make sure that their data and applications are secured across the AI lifecycle, including data preparation, training, and inferencing. To help ensure the confidentiality and integrity of customer data, all of our Nitro-based Amazon Elastic Compute Cloud (Amazon EC2) instances that run ML accelerators such as AWS Inferentia and AWS Trainium, and graphics processing units (GPUs) such as P4, P5, G5, and G6, are backed by the industry-leading security capabilities of the AWS Nitro System. By design, there is no mechanism for anyone at AWS to access Nitro EC2 instances that customers use to run their workloads. The NCC Group, an independent cybersecurity firm, has validated the design of the Nitro System.

We take a secure approach to generative AI and make it practical for our customers to secure their generative AI workloads across the generative AI stack so that they can focus on building and scaling. All AWS services—including generative AI services—support encryption, and we continue to innovate and invest in controls and encryption features that allow our customers to encrypt everything everywhere.

For example, Amazon Bedrock uses encryption to protect data in transit and at rest, and data remains in the AWS Region where Amazon Bedrock is being used. Customer data, such as prompts, completions, custom models, and data used for fine-tuning or continued pre-training, is not used for Amazon Bedrock service improvement and is never shared with third-party model providers. When customers fine-tune a model in Amazon Bedrock, the data is never exposed to the public internet, never leaves the AWS network, is securely transferred through a customer’s virtual private cloud (VPN), and is encrypted in transit and at rest.

SageMaker protects ML model artifacts and other system artifacts by encrypting data in transit and at rest. Amazon Bedrock and SageMaker integrate with AWS Key Management Service (AWS KMS) so that customers can securely manage cryptographic keys. AWS KMS is designed so that no one—not even AWS employees—can retrieve plaintext keys from the service.

Developing responsibly

The responsible development and use of AI is a priority for AWS. We believe that AI should take a people-centric approach that makes AI safe, fair, secure, and robust. We are committed to supporting customers with responsible AI and helping them build fairer and more transparent AI applications to foster trust, meet regulatory requirements, and use AI to benefit their business and stakeholders. AWS is the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements and controls for organizations to promote the responsible development and use of AI systems.

We take responsible AI from theory into practice by providing the necessary tools, guidance, and resources, including Amazon Bedrock Guardrails to help implement safeguards tailored to customer generative AI applications and aligned with their responsible AI policies, or Model Evaluation on Amazon Bedrock to evaluate, compare, and select the best FMs for specific use cases based on custom metrics, such as accuracy, robustness, and toxicity. Additionally, Amazon SageMaker Model Monitor automatically detects and alerts customers of inaccurate predictions from deployed models. We continue to publish AI Service Cards to enhance transparency by providing a single place to find information on the intended use cases and limitations, responsible AI design choices, and performance optimization best practices for our AI services and models.

Building resilience

Resilience plays a pivotal role in the development of any workload, and AI/ML workloads are no different. Customers need to know that their workloads in the cloud will continue to operate in the face of natural disasters, network disruptions, or disruptions due to geopolitical crises. AWS delivers the highest network availability of any cloud provider and is the only cloud provider to offer three or more Availability Zones (AZs) in all Regions, providing more redundancy. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. We have published guidance on designing generative AI workloads for resilience. To enable higher throughput and enhanced resilience during periods of peak demands in Amazon Bedrock, customers can use cross-region inference to distribute traffic across multiple Regions. For customers with specific European Union data sovereignty requirements, we are launching the AWS European Sovereign Cloud in 2025 to offer an additional layer of control and resilience.

Supporting choice and flexibility

It’s important that customers have access to diverse AI technologies, while having the freedom to choose the right solutions to meet their needs. AWS provides more diversity, choice, and flexibility so that customers can select the AI solution that best aligns with their specific requirements, whether that’s using open-source models, proprietary solutions, or their own custom AI models. For example, we understand the importance of open-source AI in fostering transparency, collaboration, and rapid innovation. Open-source models enable scrutiny of vulnerabilities, drive security improvements, and support research on AI safety. Amazon SageMaker JumpStart provides pretrained, open-source models for a wide range of common use cases. To provide practitioners and developers with the guidance and tools that they need to create secure-by-design AI systems, we are a founding member of the open-source initiative Coalition for Secure AI (CoSAI).

Also, our commitment to portability and interoperability helps ensure that customers can move easily between environments. For customers changing IT providers, we’ve taken concrete steps to lower costs, and AWS is actively engaged in efforts to facilitate switching between cloud providers, including through our support of the Cloud Infrastructure Service Providers in Europe (CISPE) Cloud Switching Framework, which lays out guidance to assist providers and customers in the switching process. This gives organizations the flexibility to adapt their cloud and AI strategies as their needs evolve.

We remain committed to providing customers with a choice of diverse AI technologies, along with secure and compliant ways to build their AI applications throughout the development lifecycle. Through this approach, customers can enhance the security, compliance, and resilience of their systems.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

DO it again: how we used Durable Objects to add WebSockets support and authentication to AI Gateway

2024-11-20 Catarina Pires Mota

Post Syndicated from Catarina Pires Mota original https://blog.cloudflare.com/do-it-again

In October 2024, we talked about storing billions of logs from your AI application using AI Gateway, and how we used Cloudflare’s Developer Platform to do this.

With AI Gateway already processing over 3 billion logs and experiencing rapid growth, the number of connections to the platform continues to increase steadily. To help developers manage this scale more effectively, we wanted to offer an alternative to implementing HTTP/2 keep-alive to maintain persistent HTTP(S) connections, thereby avoiding the overhead of repeated handshakes and TLS negotiations with each new HTTP connection to AI Gateway. We understand that implementing HTTP/2 can present challenges, particularly when many libraries and tools may not support it by default and most modern programming languages have well-established WebSocket libraries available.

With this in mind, we used Cloudflare’s Developer Platform and Durable Objects (yes, again!) to build a WebSockets API that establishes a single, persistent connection, enabling continuous communication.

Through this API, all AI providers supported by AI Gateway can be accessed via WebSocket, allowing you to maintain a single TCP connection between your client or server application and the AI Gateway. The best part? Even if your chosen provider doesn’t support WebSockets, we handle it for you, managing the requests to your preferred AI provider.

By connecting via WebSocket to AI Gateway, we make the requests to the inference service for you using the provider’s supported protocols (HTTPS, WebSocket, etc.), and you can keep the connection open to execute as many inference requests as you would like.

To make your connection to AI Gateway more secure, we are also introducing authentication for AI Gateway. The new WebSockets API will require authentication. All you need to do is create a Cloudflare API token with the permission “AI Gateway: Run” and send that in the cf-aig-authorization header.

In the flow diagram above:

1️⃣ When Authenticated Gateway is enabled and a valid token is included, requests will pass successfully.

2️⃣ If Authenticated Gateway is enabled, but a request does not contain the required cf-aig-authorization header with a valid token, the request will fail. This ensures only verified requests pass through the gateway.

3️⃣ When Authenticated Gateway is disabled, the cf-aig-authorization header is bypassed entirely, and any token — whether valid or invalid — is ignored.

How we built it

We recently used Durable Objects (DOs) to scale our logging solution for AI Gateway, so using WebSockets within the same DOs was a natural fit.

When a new WebSocket connection is received by our Cloudflare Workers, we implement authentication in two ways to support the diverse capabilities of WebSocket clients. The primary method involves validating a Cloudflare API token through the cf-aig-authorization header, ensuring the token is valid for the connecting account and gateway.

However, due to limitations in browser WebSocket implementations, we also support authentication via the “sec-websocket-protocol” header. Browser WebSocket clients don’t allow for custom headers in their standard API, complicating the addition of authentication tokens in requests. While we don’t recommend that you store API keys in a browser, we decided to add this method to add more flexibility to all WebSocket clients.

// Built-in WebSocket client in browsers
const socket = new WebSocket("wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/", [
   "cf-aig-authorization.${AI_GATEWAY_TOKEN}"
]);


// ws npm package
import WebSocket from "ws";
const ws = new WebSocket("wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",{
   headers: {
       "cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
   },
});

After this initial verification step, we upgrade the connection to the Durable Object, meaning that it will now handle all the messages for the connection. Before the new connection is fully accepted, we generate a random UUID, so this connection is identifiable among all the messages received by the Durable Object. During an open connection, any AI Gateway settings passed via headers — such as cf-aig-skip-cache (which bypasses caching when set to true) — are stored and applied to all requests in the session. However, these headers can still be overridden on a per-request basis, just like with the Universal Endpoint today.

How it works

Once the connection is established, the Durable Object begins listening for incoming messages. From this point on, users can send messages in the AI Gateway universal format via WebSocket, simplifying the transition of your application from an existing HTTP setup to WebSockets-based communication.

import WebSocket from "ws";
const ws = new WebSocket("wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",{
   headers: {
       "cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
   },
});

ws.send(JSON.stringify({
   type: "universal.create",
   request: {
      "eventId": "my-request",
      "provider": "workers-ai",
      "endpoint": "@cf/meta/llama-3.1-8b-instruct",
      "headers": {
         "Authorization": "Bearer WORKERS_AI_TOKEN",
         "Content-Type": "application/json"
      },
      "query": {
         "prompt": "tell me a joke"
      }
   }
}));

ws.on("message", function incoming(message) {
   console.log(message.toString())
});

When a new message reaches the Durable Object, it’s processed using the same code that powers the HTTP Universal Endpoint, enabling seamless code reuse across Workers and Durable Objects — one of the key benefits of building on Cloudflare.

For non-streaming requests, the response is wrapped in a JSON envelope, allowing us to include additional information beyond the AI inference itself, such as the AI Gateway log ID for that request.

Here’s an example response for the request above:

{
  "type":"universal.created",
  "metadata":{
     "cacheStatus":"MISS",
     "eventId":"my-request",
     "logId":"01JC3R94FRD97JBCBX3S0ZAXKW",
     "step":"0",
     "contentType":"application/json"
  },
  "response":{
     "result":{
        "response":"Why was the math book sad? Because it had too many problems. Would you like to hear another one?"
     },
     "success":true,
     "errors":[],
     "messages":[]
  }
}

For streaming requests, AI Gateway sends an initial message with request metadata telling the developer the stream is starting.

{
  "type":"universal.created",
  "metadata":{
     "cacheStatus":"MISS",
     "eventId":"my-request",
     "logId":"01JC40RB3NGBE5XFRZGBN07572",
     "step":"0",
     "contentType":"text/event-stream"
  }
}

After this initial message, all streaming chunks are relayed in real-time to the WebSocket connection as they arrive from the inference provider. Note that only the eventId field is included in the metadata for these streaming chunks (more info on what this new field is below).

{
  "type":"universal.stream",
  "metadata":{
     "eventId":"my-request",
  }
  "response":{
     "response":"would"
  }
}

This approach serves two purposes: first, all request metadata is already provided in the initial message. Second, it addresses the concurrency challenge of handling multiple streaming requests simultaneously.

Handling asynchronous events

With WebSocket connections, client and server can send messages asynchronously at any time. This means the client doesn’t need to wait for a server response before sending another message. But what happens if a client sends multiple streaming inference requests immediately after the WebSocket connection opens?

In this case, the server streams all the inference responses simultaneously to the client. Since everything occurs asynchronously, the client has no built-in way to identify which response corresponds to each request.

To address this, we introduced a new field in the Universal format called eventId, which allows AI Gateway to include a client-defined ID with each message, even in a streaming WebSocket environment.

So, to fully answer the question above: the server streams both responses in parallel chunks, and the client can accurately identify which request each message belongs to based on the eventId.

Once all chunks for a request have been streamed, AI Gateway sends a final message to signal the request’s completion. For added flexibility, this message includes all the metadata again, even though it was also provided at the start of the streaming process.

{
  "type":"universal.done",
  "metadata":{
     "cacheStatus":"MISS",
     "eventId":"my-request",
     "logId":"01JC40RB3NGBE5XFRZGBN07572",
     "step":"0",
     "contentType":"text/event-stream"
  }
}

Try it out today

AI Gateway’s real-time Websocket API is now in beta and open to everyone!

To try it out, copy your gateway universal endpoint URL, and replace the “https://” with “wss://”, like this:

wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/

Then open a WebSocket connection using your Universal Endpoint, and guarantee that it is authenticated with a Cloudflare token with the AI Gateway Run permission.

Here’s an example code using the ws npm package:

import WebSocket from "ws";
const ws = new WebSocket("wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/", {
   headers: {
       "cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
   },
});

ws.on("open", function open() {
   console.log("Connected to server.");
   ws.send(JSON.stringify({
      type: "universal.create",
      request: {
         "provider": "workers-ai",
         "endpoint": "@cf/meta/llama-3.1-8b-instruct",
         "headers": {
            "Authorization": "Bearer WORKERS_AI_TOKEN",
            "Content-Type": "application/json"
         },
         "query": {
            "stream": true,
            "prompt": "tell me a joke"
         }
      }
   }));
});


ws.on("message", function incoming(message) {
   console.log(message.toString())
});

Here’s an example code using the built-in browser WebSocket client:

const socket = new WebSocket("wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/", [
   "cf-aig-authorization.${AI_GATEWAY_TOKEN}"
]);

socket.addEventListener("open", (event) => {
  console.log("Connected to server.");
   socket.send(JSON.stringify({
      type: "universal.create",
      request: {
         "provider": "workers-ai",
         "endpoint": "@cf/meta/llama-3.1-8b-instruct",
         "headers": {
            "Authorization": "Bearer WORKERS_AI_TOKEN",
            "Content-Type": "application/json"
         },
         "query": {
            "stream": true,
            "prompt": "tell me a joke"
         }
      }
   }));
});

socket.addEventListener("message", (event) => {
  console.log(event.data);
});

And we will DO it again

In Q1 2025, we plan to support WebSocket-to-WebSocket connections (using DOs), allowing you to connect to OpenAI’s new real-time API directly through our platform. In the meantime, you can deploy this Worker in your account to proxy the requests yourself.

If you have any questions, reach out on our Discord channel. We’re also hiring for AI Gateway, check out Cloudflare Jobs in Lisbon!

This is the New NVIDIA GB200 NVL4

2024-11-19 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/this-is-the-new-nvidia-gb200-nvl4-arm-grace-blackwell/

We found the new NVIDIA GB200 NVL4 on the SC24 show floor the same day that it was announced showing off a new node design

The post This is the New NVIDIA GB200 NVL4 appeared first on ServeTheHome.

NVIDIA H200 NVL 4-Way Shown at OCP Summit 2024

2024-11-16 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/nvidia-h200-nvl-4-way-shown-at-ocp-summit-2024/

We saw the NVIDIA H200 NVL solution at OCP Summit 2024 with four 141GB GPUs connected over NVLink for inferencing workloads

The post NVIDIA H200 NVL 4-Way Shown at OCP Summit 2024 appeared first on ServeTheHome.

New Shots of the NVIDIA HGX B200

2024-11-11 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/new-shots-of-the-nvidia-hgx-b200-astera-labs/

We saw another NVIDIA HGX B200 8-GPU platform with bare NVLink Switch chips and Astera Labs PCIe retimers onboard

The post New Shots of the NVIDIA HGX B200 appeared first on ServeTheHome.

ASUS AMD EPYC CXL Memory Enabled Server AI and More OCP Summit 2024

2024-11-01 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/asus-amd-epyc-cxl-memory-enabled-server-ai-ocp-summit-2024/

We check out four AMD EPYC 9005 servers from ASUS including AI GPU servers and the 2U 4-node ASUS RS520QA-E13-RS8U with CXL memory expansion

The post ASUS AMD EPYC CXL Memory Enabled Server AI and More OCP Summit 2024 appeared first on ServeTheHome.

HPE Cray XD670 NVIDIA HGX with CoolIT Liquid Cooling Shown

2024-10-31 Eric Smith

Post Syndicated from Eric Smith original https://www.servethehome.com/hpe-cray-xd670-nvidia-hgx-coolit-liquid-cooling-gigabyte-intel-ocp-shown/

We take a look at the HPE Cray XD670, a NVIDIA HGX H100 and H200 system with CoolIT liquid-cooling and discuss how it compares

The post HPE Cray XD670 NVIDIA HGX with CoolIT Liquid Cooling Shown appeared first on ServeTheHome.

Inside the 100K GPU xAI Colossus Cluster that Supermicro Helped Build for Elon Musk

2024-10-28 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/inside-100000-nvidia-gpu-xai-colossus-cluster-supermicro-helped-build-for-elon-musk/

We get inside access into the 100K NVIDIA GPU xAI Colossus cluster that Supermicro built for Elon Musk’s teams

The post Inside the 100K GPU xAI Colossus Cluster that Supermicro Helped Build for Elon Musk appeared first on ServeTheHome.

LITEON Shows NVIDIA GB200 NVL72 Rack at OCP Summit 2024

2024-10-28 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/liteon-shows-nvidia-gb200-nvl72-rack-at-ocp-summit-2024/

At OCP Summit 2024, we saw the new LITEON liquid cooled and powered rack for the NVIDIA GB200 NVL72 AI servers

The post LITEON Shows NVIDIA GB200 NVL72 Rack at OCP Summit 2024 appeared first on ServeTheHome.

Billions and billions (of logs): scaling AI Gateway with the Cloudflare Developer Platform

2024-10-24 Catarina Pires Mota

Post Syndicated from Catarina Pires Mota original https://blog.cloudflare.com/billions-and-billions-of-logs-scaling-ai-gateway-with-the-cloudflare

With the rapid advancements occurring in the AI space, developers face significant challenges in keeping up with the ever-changing landscape. New models and providers are continuously emerging, and understandably, developers want to experiment and test these options to find the best fit for their use cases. This creates the need for a streamlined approach to managing multiple models and providers, as well as a centralized platform to efficiently monitor usage, implement controls, and gather data for optimization.

AI Gateway is specifically designed to address these pain points. Since its launch in September 2023, AI Gateway has empowered developers and organizations by successfully proxying over 2 billion requests in just one year, as we highlighted during September’s Birthday Week. With AI Gateway, developers can easily store, analyze, and optimize their AI inference requests and responses in real time.

With our initial architecture, AI Gateway faced a significant challenge: the logs, those critical trails of data interactions between applications and AI models, could only be retained for 30 minutes. This limitation was not just a minor inconvenience; it posed a substantial barrier for developers and businesses needing to analyze long-term patterns, ensure compliance, or simply debug over more extended periods.

In this post, we’ll explore the technical challenges and strategic decisions behind extending our log storage capabilities from 30 minutes to being able to store billions of logs indefinitely. We’ll discuss the challenges of scale, the intricacies of data management, and how we’ve engineered a system that not only meets the demands of today, but is also scalable for the future of AI development.

Background

AI Gateway is built on Cloudflare Workers, a serverless platform that runs on the Cloudflare network, allowing developers to write small JavaScript functions that can execute at the point of need, near the user, on Cloudflare’s vast network of data centers, without worrying about platform scalability.

Our customers use multiple providers and models and are always looking to optimize the way they do inference. And, of course, in order to evaluate their prompts, performance, cost, and to troubleshoot what’s going on, AI Gateway’s customers need to store requests and responses. New requests show up within 15 seconds and customers can check a request’s cost, duration, number of tokens, and provide their feedback (thumbs up or down).

This scales in a way where an account can have multiple gateways and each gateway has its own settings. In our first implementation, a backend worker was responsible for storing Real Time Logs and other background tasks. However, in the rapidly evolving domain of artificial intelligence, where real-time data is as precious as the insights it provides, managing log data efficiently becomes paramount. We recognized that to truly empower our users, we needed to offer a solution where logs weren’t just transient records but could be stored permanently. Permanent log storage means developers can now track the performance, security, and operational insights of their AI applications over time, enabling not only immediate troubleshooting but also longitudinal studies of AI behavior, usage trends, and system health.

The diagram above describes our old architecture, which could only store 30 minutes of data.

Tracing the path of a request through the AI Gateway, as depicted in the sequence above:

A developer sends a new inference request, which is first received by our Gateway Worker.
The Gateway Worker then performs several checks: it looks for cached results, enforces rate limits, and verifies any other configurations set by the user for their gateway. Provided all conditions are met, it forwards the request to the selected inference provider (in this diagram, OpenAI).
The inference provider processes the request and sends back the response.
Simultaneously, as the response is relayed back to the developer, the request and response details are also dispatched to our Backend Worker. This worker’s role is to manage and store the log of this transaction.

The challenge: Store two billion logs

First step: real-time logs

Initially, the AI Gateway project stored both request metadata and the actual request bodies in a D1 database. This approach facilitated rapid development in the project’s infancy. However, as customer engagement grew, the D1 database began to fill at an accelerating rate, eventually retaining logs for only 30 minutes at a time.

To mitigate this, we first optimized the database schema, which extended the log retention to one hour. However, we soon encountered diminishing returns due to the sheer volume of byte data from the request bodies. Post-launch, it became clear that a more scalable solution was necessary. We decided to migrate the request bodies to R2 storage, significantly alleviating the data load on D1. This adjustment allowed us to incrementally extend log retention to 24 hours.

Consequently, D1 functioned primarily as a log index, enabling users to search and filter logs efficiently. When users needed to view details or download a log, these actions were seamlessly proxied through to R2.

This dual-system approach provided us with the breathing room to contemplate and develop more sophisticated storage solutions for the future.

Second step: persistent logs and Durable Object transactional storage

As our traffic surged, we encountered a growing number of requests from customers wanting to access and compare older logs.

Upon learning that the Durable Objects team was seeking beta testers for their new Durable Objects with SQLite, we eagerly signed up.

Originally, we considered Durable Objects as the ideal solution for expanding our log storage capacity, which required us to shard the logs by a unique string. Initially, this string was the account ID, but during a mid-development load test, we hit a cap at 10 million logs per Durable Object. This limitation meant that each account could only support up to this number of logs.

Given our commitment to the DO migration, we saw an opportunity rather than a constraint. To overcome the 10 million log limit per account, we refined our approach to shard by both account ID and gateway name. This adjustment effectively raised the storage ceiling from 10 million logs per account to 10 million per gateway. With the default setting allowing each account up to 10 gateways, the potential storage for each account skyrocketed to 100 million logs.

This strategic pivot not only enabled us to store a significantly larger number of logs. But also enhanced our flexibility in gateway management. Now, when a gateway is deleted, we can simply remove the corresponding Durable Object.

Additionally, this sharding method isolates high-volume request scenarios. If one customer’s heavy usage slows down log insertion, it only impacts their specific Durable Object, thereby preserving performance for other customers.

Taking a glance at the revised architecture diagram, we replaced the Backend Worker with our newly integrated Durable Object. The rest of the request flow remains unchanged, including the concurrent response to the user and the interaction with the Durable Object, which occurs in the fourth step.

Leveraging Cloudflare’s network, our Gateway Worker operates near the user’s location, which in turn positions the user’s Durable Object close by. This proximity significantly enhances the speed of log insertion and query operations.

Third step: managing thousands of Durable Objects

As the number of users and requests on AI Gateway grows, managing each unique Durable Object (DO) becomes increasingly complex. New customers join continuously, and we needed an efficient method to track each DO, ensure users stay within their 10 gateway limit, and manage the storage capacity for free users.

To address these challenges, we introduced another layer of control with a new Durable Object we’ve named the Account Manager. The primary function of the Account Manager is straightforward yet crucial: it keeps user activities in check.

Here’s how it works: before any Gateway commits a new log to permanent storage, it consults the Account Manager. This check determines whether the gateway is allowed to insert the log based on the user’s current usage and entitlements. The Account Manager uses its own SQLite database to verify the total number of rows a user has and their service level. If all checks pass, it signals the Gateway that the log can be inserted. It was paramount to guarantee that this entire validation process occurred in the background, ensuring that the user experience remains seamless and uninterrupted.

The Account Manager stays updated by periodically receiving data from each Gateway’s Durable Object. Specifically, after every 1000 inference requests, the Gateway sends an update on its total rows to the Account Manager, which then updates its local records. This system ensures that the Account Manager has the most current data when making its decisions.

Additionally, the Account Manager is responsible for monitoring customer entitlements. It tracks whether an account is on a free or paid plan, how many gateways a user is permitted to create, and the log storage capacity allocated to each gateway.

Through these mechanisms, the Account Manager not only helps in maintaining system integrity but also ensures fair usage across all users of AI Gateway.

AI evaluations and Durable Objects sharding

As we continue to develop evaluations to fully automatic and, in the future, use Large Language Models (LLMs), we are now taking the first step towards this goal and launching the open beta phase of comprehensive AI evaluations, centered on Human-in-the-Loop feedback.

This feature empowers users to create bespoke datasets from their application logs, thereby enabling them to score and evaluate the performance, speed, and cost-effectiveness of their models, with a primary focus on LLMs and automated scoring, analyzing the performance of LLMs, providing developers with objective, data-driven insights to refine their models.

To do this, developers require a reliable logging mechanism that persists logs from multiple gateways, storing up to 100 million logs in total (10 million logs per gateway, across 10 gateways). This represents a significant volume of data, as each request made through the AI Gateway generates a log entry, with some log entries potentially exceeding 50 MB in size.

This necessity leads us to work on the expansion of log storage capabilities. Since log storage is limited to 10 million logs per gateway, in future iterations, we aim to scale this capacity by implementing sharded Durable Objects (DO), allowing multiple Durable Objects per gateway to handle and store logs. This scaling strategy will enable us to store significantly larger volumes of logs, providing richer data for evaluations (using LLMs as a judge or from user input), all through AI Gateway.

Coming Soon

We are working on improving our existing Universal Endpoint, the next step on an enhanced solution that builds on existing fallback mechanisms to offer greater resilience, flexibility, and intelligence in request management.

Currently, when a provider encounters an error or is unavailable, our system falls back to an alternative provider to ensure continuity. The improved Universal Endpoint takes this a step further by introducing automatic retry capabilities, allowing failed requests to be reattempted before fallback is triggered. This significantly improves reliability by handling transient errors and increasing the likelihood of successful request fulfillment. It will look something like this:

curl --location 'https://aig.example.com/' \
--header 'CF-AIG-TOKEN: Bearer XXXX' \
--header 'Content-Type: application/json' \
--data-raw '[
    {
        "id": "0001",
        "provider": "openai",
        "endpoint": "chat/completions",
        "headers": {
            "Authorization": "Bearer XXXX",
            "Content-Type": "application/json"
        },
        "query": {
            "model": "gpt-3.5-turbo",
            "messages": [
                {
                    "role": "user",
                    "content": "generate a prompt to create cloudflare random images"
                }
            ]
        },
        "option": {
            "retry": 2,
            "delay": 200,
            "onComplete": {
                "provider": "workers-ai",
                "endpoint": "@cf/stabilityai/stable-diffusion-xl-base-1.0",
                "headers": {
                    "Authorization": "Bearer A5UFQkHewHF1-sA3hTVQFaPxRuu5wmS0eJcCS_MC",
                    "Content-Type": "application/json"
                },
                "query": {
                    "messages": [
                        {
                            "role": "user",
                            "content": "<prompt-response id='\''0001'\'' />"
                        }
                    ]
                }
            }
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/stabilityai/stable-diffusion-xl-base-1.0",
        "headers": {
            "Authorization": "Bearer XXXXXX",
            "Content-Type": "application/json"
        },
        "query": {
            "messages": [
                {
                    "role": "user",
                    "content": "create a image of a missing cat"
                }
            ]
        }
    }
]'

The request to the improved Universal Endpoint system demonstrates how it handles multiple providers with integrated retry mechanisms and fallback logic. In this example, the first request is sent to a provider like OpenAI, asking it to generate a text-to-image prompt. The “retry” option ensures that transient issues don’t result in immediate failure.

The system’s ability to seamlessly switch between providers while applying retry strategies ensures higher reliability and robustness in managing requests. By leveraging fallback logic, the Improved Universal Endpoint can dynamically adapt to provider failures, ensuring that tasks are completed successfully even in complex, multi-step workflows.

In addition to retry logic, we will have the ability to inspect requests and responses and make dynamic decisions based on the content of the result. This enables developers to create conditional workflows where the system can adapt its behavior depending on the nature of the response, creating a highly flexible and intelligent decision-making process.

If you haven’t yet used AI Gateway, check out our developer documentation on how to get started. If you have any questions, reach out on our Discord channel.

The Top NVIDIA HGX B200 Server Supermicro SYS-422GA-NBRT-LCC at OCP 2024

2024-10-19 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/the-top-nvidia-hgx-b200-server-supermicro-sys-422ga-nbrt-lcc-intel-ocp-2024/

We saw the Supermicro SYS-422GA-NBRT-LCC the next generation of the company’s industry-leading liquid-cooled NVIDIA HGX B200 platform

The post The Top NVIDIA HGX B200 Server Supermicro SYS-422GA-NBRT-LCC at OCP 2024 appeared first on ServeTheHome.

Meta Announces AMD Instinct MI300X for AI Inference and NVIDIA GB200 Catalina

2024-10-15 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/meta-announces-amd-mi300x-for-ai-inference-marvell-fbnic-cisco-arista-broadcom/

Meta outlined its AI platforms at OCP Summit 2024, including GPUs from AMD and NVIDIA and networking from Marvell, Broadcom, Cisco and Arista

The post Meta Announces AMD Instinct MI300X for AI Inference and NVIDIA GB200 Catalina appeared first on ServeTheHome.

AMD Pensando Pollara 400 UltraEthernet RDMA NIC Launched

2024-10-15 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/amd-pensando-pollara-400-ultraethernet-rdma-nic-launched/

The new AMD Pensando Pollara 400 is a 400GBE UltraEthernet Consortium RDMA NIC designed for AI servers and next-gen UEC networks

The post AMD Pensando Pollara 400 UltraEthernet RDMA NIC Launched appeared first on ServeTheHome.

AMD Instinct MI325X Launched and the MI355X is Coming

2024-10-12 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/amd-instinct-mi325x-launched-and-the-mi355x-is-coming/

This week saw the launch of the AMD Instinct MI325X, a 256GB HBM3E accelerator that is an update to the MI300X. AMD also talked 2025 MI355X

The post AMD Instinct MI325X Launched and the MI355X is Coming appeared first on ServeTheHome.

This is the New Astera Labs Scorpio PCIe Switch Targeting Broadcom in AI Servers

2024-10-08 John Lee

Post Syndicated from John Lee original https://www.servethehome.com/this-is-the-new-astera-labs-scorpio-pcie-switch-targeting-broadcom-in-ai-servers/

This is the new Astera Labs Scorpio PCIe switch designed to displace Broadcom in AI servers connecting CPU, GPU, NIC, and SSDs together

The post This is the New Astera Labs Scorpio PCIe Switch Targeting Broadcom in AI Servers appeared first on ServeTheHome.

New Microsoft Azure NVIDIA GB200 Systems Shown as Two-Thirds Cooling

2024-10-08 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/new-microsoft-azure-nvidia-gb200-systems-shown/

Microsoft Azure NVIDIA GB200 systems are huge, with two thirds of the aisle space being dedicated to cooling the NVIDIA rack

The post New Microsoft Azure NVIDIA GB200 Systems Shown as Two-Thirds Cooling appeared first on ServeTheHome.

Blending Zabbix and AI with Tomáš Heřmánek

2024-10-01 Michael Kammer

Post Syndicated from Michael Kammer original https://blog.zabbix.com/blending-zabbix-and-ai-with-tomas-hermanek/28832/

Zabbix Summit 2024 is only a few days away, which means that it’s time for the last of our interviews with Summit speakers. Our final chat this year is with Tomáš Heřmánek, the CEO and Founder of initMAX s.r.o. We asked him about his beginnings in the tech industry, how he got started with Zabbix, and how AI will change the game for monitoring in general and Zabbix in particular.

Please tell us a bit about yourself and the journey that led you to initMAX.

My journey in the IT field started with small ISPs and later took a significant leap into the world of Linux and application management, where the need for effective monitoring became evident. I worked for a company that prioritized high-quality open-source solutions, and it was during this time that we adopted Zabbix version 1.8 as a replacement for Nagios, which we found to be inflexible. Shortly after our deployment, Zabbix 2.0 was released. It introduced JMX monitoring, which was crucial for us. Since then, Zabbix has been our go-to solution for monitoring.

I set a personal goal to master this outstanding monitoring system and participated in the first official Zabbix training in the Czech Republic, where I earned my initial certifications as a Zabbix Specialist and Professional on version 3.0. The training experience drew me deeper into the world of Zabbix, especially after meeting a burgeoning group of enthusiasts in the country. I felt compelled to give back to the community that had supported me.

How long have you been using Zabbix? What kind of Zabbix-related tasks does your team tackle on a daily basis?

When I started my own company, becoming a Zabbix partner was a natural choice. To further contribute to the community, I pursued the Expert and Trainer certifications. It was the most challenging 14 days of my life, but it was worth it. For anyone serious about Zabbix, I highly recommend participating in official training sessions and actively engaging with the community through forums, local groups, Telegram, WhatsApp, blogs, and forums. This commitment to support and strengthen the community further.is also why we created our own wiki, which is accessible to everyone without restrictions.

Can you give us a few clues about what we can expect to hear during your Zabbix Summit presentation?

This year, I have prepared a demonstration for the Zabbix Summit showcasing how we integrate AI into our operations, including various modifications to the web interface that allow us to automate and streamline routine tasks. Besides showcasing these innovations, we will also be making some parts of our work available to the public. The main focus of my presentation will be on problem identification, automating the creation of preprocessing steps, and using a chatbot for creating hosts, reading configurations, and making modifications. Essentially, it’s a smart assistant and guide all in one.

The final section, which we find the most challenging, deals with automated event correlation and the creation of a topology, from which correlations partially derive and evaluate. We are using the new Zabbix 7.0 feature – root cause and symptoms – for visualization in Zabbix. Our goal is to showcase not only the capabilities of Zabbix in combination with AI, but also to contribute back to the community by sharing some of these developments freely.

In your experience, does Zabbix lend itself easily to enhancement via AI?

AI is something that truly fascinates us and is currently shaping the world. From our experience, we believe that the possibilities are limited only by our imagination. In the future, I can envision AI autonomously discovering elements that need to be monitored, integrating them into Zabbix, and configuring everything necessary for effective monitoring.

What changes do you think AI will bring to the world of monitoring in general over the next decade or so?

I foresee a shift in our roles, moving away from traditional IT tasks towards a focus on idea generation, control, and the customization of artificial intelligence. As AI continues to evolve, it will not only enhance automation but also empower us to explore and implement innovative solutions more effectively.

The post Blending Zabbix and AI with Tomáš Heřmánek appeared first on Zabbix Blog.

Wrapping up another Birthday Week celebration

2024-09-30 Kelly May Johnston

Post Syndicated from Kelly May Johnston original https://blog.cloudflare.com/birthday-week-2024-wrap-up

2024 marks Cloudflare’s 14th birthday. Birthday Week each year is packed with major announcements and the release of innovative new offerings, all focused on giving back to our customers and the broader Internet community. Birthday Week has become a proud tradition at Cloudflare and our culture, to not just stay true to our mission, but to always stay close to our customers. We begin planning for this week of celebration earlier in the year and invite everyone at Cloudflare to participate.

Months before Birthday Week, we invited teams to submit ideas for what to announce. We were flooded with submissions, from proposals for implementing new standards to creating new products for developers. Our biggest challenge is finding space for it all in just one week — there is still so much to build. Good thing we have a birthday to celebrate each year, but we might need an extra day in Birthday Week next year!

In case you missed it, here’s everything we announced during 2024’s Birthday Week:

Monday

What	In a sentence…
Start auditing and controlling the AI models accessing your content	Understand which AI-related bots and crawlers can access your website, and which content you choose to allow them to consume.
Making zone management more efficient with batch DNS record updates	Customers using Cloudflare to manage DNS can create a whole batch of records, enable proxying on many records, update many records to point to a new target at the same time, or even delete all of their records.
Introducing Ephemeral IDs: a new tool for fraud detection	Taking the next step in advancing security with Ephemeral IDs, a new feature that generates a unique short-lived ID, without relying on any network-level information.

Tuesday

What	In a sentence…
Cloudflare partners to deliver safer browsing experience to homes	Internet service, network, and hardware equipment providers can sign up and partner with Cloudflare to deliver a safer browsing experience to homes.
A safer Internet with Cloudflare: free threat intelligence, analytics, and new threat detections	Free threat intelligence, analytics, new threat detections, and more.
Automatically generating Cloudflare’s Terraform provider	The last pieces of the OpenAPI schemas ecosystem to now be automatically generated — the Terraform provider and API reference documentation.
Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp	Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp.

Wednesday

What	In a sentence…
Introducing Speed Brain: helping web pages load 45% faster	Speed Brain, our latest leap forward in speed, uses the Speculation Rules API to prefetch content for users’ likely next navigations — downloading web pages before they navigate to them and making pages load 45% faster.
Instant Purge: invalidating cached content in under 150ms	Instant Purge invalidates cached content in under 150ms, offering the industry’s fastest cache purge with global latency for purges by tags, hostnames, and prefixes.
New standards for a faster and more private Internet	Zstandard compression, Encrypted Client Hello, and more speed and privacy announcements all released for free.
TURN and anycast: making peer connections work globally	Starting today, Cloudflare Calls’ TURN service is now generally available to all Cloudflare accounts.
Cloudflare’s 12th Generation servers — 145% more performant and 63% more efficient	Next generation servers focused on exceptional performance and security, enhanced support for AI/ML workloads, and significant strides in power efficiency.

Thursday

What	In a sentence…
Startup Program revamped: build and grow on Cloudflare with up to $250,000 in credits	Eligible startups can now apply to receive up to $250,000 in credits to build using Cloudflare’s Developer Platform.
Cloudflare’s bigger, better, faster AI platform	More powerful GPUs, expanded model support, enhanced logging and evaluations in AI Gateway, and Vectorize GA with larger index sizes and faster queries.
Builder Day 2024: 18 big updates to the Workers platform	Persistent and queryable Workers logs, Node.js compatibility GA, improved Next.js support via OpenNext, built-in CI/CD for Workers, Gradual Deployments, Queues, and R2 Event Notifications GA, and more — making building on Cloudflare easier, faster, and more affordable.
Faster Workers KV	A deep dive into how we made Workers KV up to 3x faster.
Zero-latency SQLite storage in every Durable Object	Putting your application code into the storage layer, so your code runs where the data is stored.
Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding	Using new optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform.

Friday

What	In a sentence…
Our container platform is in production. It has GPUs. Here’s an early look.	We’ve been working on something new — a platform for running containers across Cloudflare’s network. We already use it in production, for AI inference and more.
Advancing cybersecurity: Cloudflare implements a new bug bounty VIP program as part of CISA Pledge commitment	We implemented a new bug bounty VIP program this year as part of our CISA Pledge commitment.
Empowering builders: introducing the Dev Alliance and Workers Launchpad Cohort #4	Get free and discounted access to essential developer tools and meet the latest set of incredible startups building on Cloudflare.
Expanding our support for open source projects with Project Alexandria	Expanding our open source program and helping projects have a sustainable and scalable future, providing tools and protection needed to thrive.
Network trends and natural language: Cloudflare Radar’s new Data Explorer & AI Assistant	A simple Web-based interface to build more complex API queries, including comparisons and filters, and visualize the results.
AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection	Extending our AI Assistant capabilities to help you build new WAF rules, added new AI bot and crawler traffic insights to Radar, and new AI bot blocking capabilities.
Reaffirming our commitment to Free	Our free plan is here to stay, and we reaffirm that commitment this week with 15 releases that make the Free plan even better.

One more thing…

Cloudflare serves millions of customers and their millions of domains across nearly every country on Earth. However, as a global company, the payment landscape can be complex — especially in regions outside of North America. While credit cards are very popular for online purchases in the US, the global picture is quite different. 60% of consumers across EMEA, APAC and LATAM choose alternative payment methods. For instance, European consumers often opt for SEPA Direct Debit, a bank transfer mechanism, while Chinese consumers frequently use Alipay, a digital wallet.

At Cloudflare, we saw this as an opportunity to meet customers where they are. Today, we’re thrilled to announce that we are expanding our payment system and launching a closed beta for a new payment method called Stripe Link. The checkout experience will be faster and more seamless, allowing our self-serve customers to pay using saved bank accounts or cards with Link. Customers who have saved their payment details at any business using Link can quickly check out without having to reenter their payment information.

These are the first steps in our efforts to expand our payment system to support global payment methods used by customers around the world. We’ll be rolling out new payment methods gradually, ensuring a smooth integration and gathering feedback from our customers every step of the way.

Until next year

That’s all for Birthday Week 2024. However, the innovation never stops at Cloudflare. Continue to follow the Cloudflare Blog all year long as we launch more products and features that help build a better Internet.