Post Syndicated from daroc original https://lwn.net/Articles/994704/

The Image-Based Linux Summit has by now established itself as a yearly event.
Following on from last year’s edition,
the third edition was held in Berlin on September 24, the
day before

All Systems Go! 2024 (ASG). The purpose of this event is to gather
stakeholders from various engineering groups and hold friendly but lively
discussions around the topic of image-based Linux — that is, Linux distributions
based around immutable images, instead of mutable root filesystems.

Infor’s Amazon OpenSearch Service Modernization: 94% faster searches and 50% lower costs

2024-10-22 Allan Pienaar

Post Syndicated from Allan Pienaar original https://aws.amazon.com/blogs/big-data/infors-amazon-opensearch-service-modernization-94-faster-searches-and-50-lower-costs/

This post is cowritten by Arjan Hammink from Infor.

Robust storage and search capabilities are critical components of Infor’s enterprise business cloud software. Infor’s Intelligent Open Network (ION) OneView platform provides real-time reporting, dashboards, and data visualization to help customers access and analyze information across their organization. To enhance the search functionality within ION OneView, Infor used Amazon OpenSearch Service to improve their software products and offer better service to their customers by providing real-time visibility. By modernizing their use of OpenSearch Service, Infor has been able to deliver a 94% improvement in search performance for customers, along with a 50% reduction in storage costs.

In this post, we’ll explore Infor’s journey to modernize its search capabilities, the key benefits they achieved, and the technologies that powered this transformation. We’ll also discuss how Infor’s customers are now able to more effectively search through business messages, documents, and other critical data within the ION OneView platform.

Where Infor started

Infor’s ION OneView was built on top of Elasticsearch v5.x on Amazon OpenSearch Service, hosted across eight AWS Regions. This architecture enabled users to track business documents from a consolidated view, search using various criteria, and correlate messages while viewing content based on user roles. Over time, Infor expanded its functionality to include “Enrich” and “Archive” capabilities, which added significant complexity. The Enrich process would build searchable messages by aggregating related events, requiring constant document updates to the OpenSearch indices. The Archive process would then move these messages and events to Amazon Simple Storage Service (Amazon S3), while using a delete_by_query to remove the corresponding documents from OpenSearch Service. These read-update-write-delete workloads, coupled with large all-encompassing indices with shard sizes of over 100GB, resulted in high volumes of deleted documents and exponential data growth that the system struggled to keep up with. To address increasing performance needs, Infor continually horizontally scaled out their OpenSearch Service domain.

Challenges

The key challenges Infor faced underscored the need for a more scalable, resilient, and cost-effective search capability that could seamlessly integrate with their cloud environment. These included the inability to effectively archive data because of high ingestion rates, resulting in longer upgrade and recovery times. Escalating costs from scaling the solution and the need for custom development to enable newer OpenSearch Service features created significant operational burdens. Additionally, Infor was seeing increasing search latency, with CPU utilization peaking at 75% and occasionally spiking above 90% (as shown in the following figures), demonstrating the performance limitations of Infor’s existing infrastructure. Collectively, these issues drove Infor’s need for a modernized search solution.

SearchLatency Pre-Modernization

CPUUtilization Pre-Modernization

Infor’s journey to modernize search with OpenSearch Service

To address the growing challenges with ION OneView, Infor partnered with AWS to undertake a comprehensive modernization effort. This involved optimizing operational processes, storage configurations, and instance selections, while also upgrading to the later versions within OpenSearch Service.

Operational review and enhancements

As a collaborative effort between Infor and AWS, a comprehensive operational review of Infor’s OpenSearch Service cluster was undertaken. With the help of slow logs and adjusting the logging thresholds, the review was able to identify long-running queries and the archival process consuming the largest amount of CPU capacity. Infor rewrote the long-running queries that used high cardinality fields, reducing the average query time.

Next, the team turned their attention to redesigning Infor’s archival process to reduce stress on the CPU. Instead of a single large index, we implemented independent indices based on customer license types. This improved delete performance by allowing the team to target old indices, using index aliases to manage the transition. We also replaced the delete_by_query approach where a query is sent to locate documents prior to a delete with a standard delete passing document IDs directly, because all the document IDs to be archived were known ahead of time. This reduced round-trip time and CPU stress compared to the sequential search requests performed by delete_by_query. This was followed by the tuning of the refresh interval based on the workload requirements, improving the indexing performance, and memory and CPU utilization.

Storage optimization

The team switched from GP2 to GP3 storage, provisioning additional input/output operations per second (IOPS) and throughput only when needed. This resulted in a 9% reduction in storage costs for most of Infor’s workloads. In all use cases where IOPS was a bottleneck, the team was able to provision additional IOPS and throughput independent of the volume size using GP3, further reducing Infor’s overall storage costs. Additionally, we implemented a shard size-based rollover strategy that provided a sharding strategy where total shards were divisible by the number of nodes to reduce the shard size to the recommended number of less than 50 GiB. This helped ensure an even distribution of data and workloads across the nodes for each index, and the performance improvements indicated that more vCPU would be beneficial given the thread pool queues and latencies. Appropriate master and data node instance types were chosen based on the new storage requirements. To support the reindexing process, the team also temporarily scaled up the storage and compute resources.

Upgrading OpenSearch Service

After optimizing the storage and compute configurations based on best practices, the Infor ION team turned their attention to using the latest features of OpenSearch Service. With the shards now at an appropriate boundary and the memory and CPU utilization at the right levels, the team was able to seamlessly upgrade from Elasticsearch version 5.x to 6.x and then to 7.x in OpenSearch Service. Each major version upgrade required careful testing and client-side code changes to make sure that the appropriate compatible client libraries were used, and the team took the necessary time after each upgrade to thoroughly validate the system and provide a smooth transition for Infor’s customers. This commitment to a methodical upgrade process allowed Infor to take advantage of the latest OpenSearch Service features, such as Graviton support, performance improvements, bug fixes, and security posture improvements, while minimizing disruption to their users.

Optimizing instance selection for performance

In collaboration with the AWS team, Infor carefully evaluated local non-volatile memory express (NVMe)-backed instance types for their ION OneView search cluster, comparing options such as i3 and R6gd instances to balance memory, latency, and storage requirements. For write-heavy workloads, the team found that using NVMe storage provided better performance and price compared to Amazon Elastic Block Store (Amazon EBS) volumes because of the high IOPS requirement of the workload, allowing them to be less reliant on off-heap memory usage. By selecting the most appropriate instance types, the ION OneView search cluster was able to resize and scale down the number of data nodes by 63% while still achieving improved throughput and reduced latency. Staying on the latest AWS instance families was also a key consideration, and the team further optimized costs by purchasing Reserved Instances after establishing a good baseline for their performance and compute consumption, with discounts ranging from 30% to 50% depending on the commitment term.

Results

The following figures show the improvements of the modernization.

New indices with the correct shard size can be seen in the increase in shards, shown in the following figure.

The updated shard strategy combined with a version upgrade led to a ten-fold increase in the volume of traffic and efficient archiving as shown in the following figure.

The SearchRate increase is shown in the following figure.

The following figure shows that the CPU increase was minimal compared to the traffic increase.

The SearchLatency reduction post upgrade and implementation of the new indexing and shard strategy is shown in the following figure.

The following figure shows the monthly spend over the past 4 quarters for two Infor ION products.

Conclusion

Through their careful modernization of the OpenSearch Service infrastructure, Infor was able to achieve 50% reduction in infrastructure costs coupled with a 94% improvement in cluster performance. The optimized clusters are now healthier and more resilient, enabling faster blue/green deployments to process even greater data volumes.

This successful transformation was driven by Infor’s close collaboration with the AWS team, using deep technical expertise and best practices to accelerate the optimization process and unlock the full potential of OpenSearch Service. Infor’s OpenSearch Service modernization has empowered the company to provide an improved, high-performing search experience for their customers at a significantly lower cost, positioning their ION OneView platform for continued growth and success.

Every workload is unique, with its own distinct characteristics. While the best practices outlined in the Amazon OpenSearch Service developer guide serve as a valuable guide, the most important step is to deploy, test, and continuously tune your own domains to find the optimal configuration, stability, and cost for your specific needs.

About the Authors

Allan Pienaar is an OpenSearch SME and Customer Success Engineer at AWS. He works closely with enterprise customers in ensuring operational excellence, maintaining production stability and optimizing cost using the Amazon OpenSearch Service.

Gokul Sarangaraju is a Senior Solutions Architect at AWS. He helps customers adopt AWS services and provides guidance in AWS cost and usage optimization. His areas of expertise include building scalable and cost-effective data analytics solutions using AWS services and tools.

Arjan Hammink is a Senior Director of Software Development at Infor, bringing over 25 years of expertise in software development and team management. He currently oversees Infor ION, a project he has been integral to since its inception in 2010 when he began as a Software Engineer. Infor ION is a robust middleware designed to streamline software integration, a key component of Infor OS, Infor’s cloud technology platform.

Upgraded Claude 3.5 Sonnet from Anthropic (available now), computer use (public beta), and Claude 3.5 Haiku (coming soon) in Amazon Bedrock

2024-10-22 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/upgraded-claude-3-5-sonnet-from-anthropic-available-now-computer-use-public-beta-and-claude-3-5-haiku-coming-soon-in-amazon-bedrock/

Four months ago, we introduced Anthropic’s Claude 3.5 in Amazon Bedrock, raising the industry bar for AI model intelligence while maintaining the speed and cost of Claude 3 Sonnet.

Today, I am excited to announce three new capabilities for the Claude 3.5 model family in Amazon Bedrock:

Upgraded Claude 3.5 Sonnet – You now have access to an upgraded Claude 3.5 Sonnet model that builds upon its predecessor’s strengths, offering even more intelligence at the same cost. Claude 3.5 Sonnet continues to improve its capability to solve real-world software engineering tasks and follow complex, agentic workflows. The upgraded Claude 3.5 Sonnet helps across the entire software development lifecycle, from initial design to bug fixes, maintenance, and optimizations. With these capabilities, the upgraded Claude 3.5 Sonnet model can help build more advanced chatbots with a warm, human-like tone. Other use cases in which the upgraded model excels include knowledge Q&A platforms, data extraction from visuals like charts and diagrams, and automation of repetitive tasks and operations.

Computer use – Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces. Developers can direct Claude to use computers the way people do: by looking at a screen, moving a cursor, clicking buttons, and typing text. This works by giving the model access to integrated tools that can return computer actions, like keystrokes and mouse clicks, editing text files, and running shell commands. Software developers can integrate computer use in their solutions by building an action-execution layer and grant screen access to Claude 3.5 Sonnet. In this way, software developers can build applications with the ability to perform computer actions, follow multiple steps, and check their results. Computer use opens new possibilities for AI-powered applications. For example, it can help automate software testing and back office tasks and implement more advanced software assistants that can interact with applications. Given this technology is early, developers are encouraged to explore lower-risk tasks and use it in a sandbox environment.

Claude 3.5 Haiku – The new Claude 3.5 Haiku is coming soon and combines rapid response times with improved reasoning capabilities, making it ideal for tasks that require both speed and intelligence. Claude 3.5 Haiku improves on its predecessor and matches the performance of Claude 3 Opus (previously Claude’s largest model) at the speed and cost of Claude 3 Haiku. Claude 3.5 Haiku can help with use cases such as fast and accurate code suggestions, highly interactive chatbots that need rapid response times for customer service, e-commerce solutions, and educational platforms. For customers dealing with large volumes of unstructured data in finance, healthcare, research, and more, Claude 3.5 Haiku can help efficiently process and categorize information.

According to Anthropic, the upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with significant gains in coding, an area where it already excelled. The upgraded Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks. On coding, it improves performance on SWE-bench Verified from 33% to 49%, scoring higher than all publicly available models. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the airline domain. The following table includes the model evaluations provided by Anthropic.

Computer use, a new frontier in AI interaction
Instead of restricting the model to use APIs, Claude has been trained on general computer skills, allowing it to use a wide range of standard tools and software programs. In this way, applications can use Claude to perceive and interact with computer interfaces. Software developers can integrate this API to enable Claude to translate prompts (for example, “find me a hotel in Rome”) into specific computer commands (open a browser, navigate this website, and so on).

More specifically, when invoking the model, software developers now have access to three new integrated tools that provide a virtual set of hands to operate a computer:

Computer tool – This tool can receive as input a screenshot and a goal and returns a description of the mouse and keyboard actions that should be performed to achieve that goal. For example, this tool can ask to move the cursor to a specific position, click, type, and take screenshots.
Text editor tool – Using this tool, the model can ask to perform operations like viewing file contents, creating new files, replacing text, and undoing edits.
Bash tool – This tool returns commands that can be run on a computer system to interact at a lower level as a user typing in a terminal.

These tools open up a world of possibilities for automating complex tasks, from data analysis and software testing to content creation and system administration. Imagine an application powered by Claude 3.5 Sonnet interacting with the computer just as a human would, navigating through multiple desktop tools including terminals, text editors, internet browsers, and also capable of filling out forms and even debugging code.

We’re excited to help software developers explore these new capabilities with Amazon Bedrock. We expect this capability to improve rapidly in the coming months, and Claude’s current ability to use computers has limits. Some actions such as scrolling, dragging, or zooming can present challenges for Claude, and we encourage you to start exploring low-risk tasks.

When looking at OSWorld, a benchmark for multimodal agents in real computer environments, the upgraded Claude 3.5 Sonnet currently gets 14.9%. While human-level skill is far ahead with about 70-75%, this result is much better than the 7.7% obtained by the next-best model in the same category.

Using the upgraded Claude 3.5 Sonnet in the Amazon Bedrock console
To get started with the upgraded Claude 3.5 Sonnet, I navigate to the Amazon Bedrock console and choose Model access in the navigation pane. There, I request access for the new Claude 3.5 Sonnet V2 model.

To test the new vision capability, I open another browser tab and download from the Our World in Data website the Wind power generation chart in PNG format.

Back in the Amazon Bedrock console, I choose Chat/text under Playgrounds in the navigation pane. For the model, I select Anthropic as the model provider and then Claude 3.5 Sonnet V2.

I use the three vertical dots in the input section of the chat to upload the image file from my computer. Then I enter this prompt:

Which are the top countries for wind power generation? Answer only in JSON.

The result follows my instructions and returns the list extracting the information from the image.

Using the upgraded Claude 3.5 Sonnet with AWS CLI and SDKs
Here’s a sample AWS Command Line Interface (AWS CLI) command using the Amazon Bedrock Converse API. I use the --query parameter of the CLI to filter the result and only show the text content of the output message:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
    --messages '[{ "role": "user", "content": [ { "text": "What do you throw out when you want to use it, but take in when you do not want to use it?" } ] }]' \
    --query 'output.message.content[*].text' \
    --output text

In output, I get this text in the response.

An anchor! You throw an anchor out when you want to use it to stop a boat, but you take it in (pull it up) when you don't want to use it and want to move the boat.

The AWS SDKs implement a similar interface. For example, you can use the AWS SDK for Python (Boto3) to analyze the same image as in the console example:

import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
IMAGE_NAME = "wind-generation.png"

bedrock_runtime = boto3.client("bedrock-runtime")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which are the top countries for wind power generation? Answer only in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Integrating computer use with your application
Let’s see how computer use works in practice. First, I take a snapshot of the desktop of a Ubuntu system:

This screenshot is the starting point for the steps that will be implemented by computer use. To see how that works, I run a Python script passing in input to the model the screenshot image and this prompt:

Find me a hotel in Rome.

This script invokes the upgraded Claude 3.5 Sonnet in Amazon Bedrock using the new syntax required for computer use:

import base64
import json
import boto3

MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"

IMAGE_NAME = "ubuntu-screenshot.png"

bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

image_base64 = base64.b64encode(image).decode("utf-8")

prompt = "Find me a hotel in Rome."

body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 512,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_base64,
                    },
                },
            ],
        }
    ],
    "tools": [
        { # new
            "type": "computer_20241022", # literal / constant
            "name": "computer", # literal / constant
            "display_height_px": 1280, # min=1, no max
            "display_width_px": 800, # min=1, no max
            "display_number": 0 # min=0, max=N, default=None
        },
        { # new
            "type": "bash_20241022", # literal / constant
            "name": "bash", # literal / constant
        },
        { # new
            "type": "text_editor_20241022", # literal / constant
            "name": "str_replace_editor", # literal / constant
        }
    ],
    "anthropic_beta": ["computer-use-2024-10-22"],
}

# Convert the native request to JSON.
request = json.dumps(body)

try:
    # Invoke the model with the request.
    response = bedrock_runtime.invoke_model(modelId=MODEL_ID, body=request)

except Exception as e:
    print(f"ERROR: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())
print(model_response)

The body of the request includes new options:

anthropic_beta with value ["computer-use-2024-10-22"] to enable computer use.
The tools section supports a new type option (set to custom for the tools you configure).
Note that the computer tool needs to know the resolution of the screen (display_height_px and display_width_px).

To follow my instructions with computer use, the model provides actions that operate on the desktop described by the input screenshot.

The response from the model includes a tool_use section from the computer tool that provides the first step. The model has found in the screenshot the Firefox browser icon and the position of the mouse arrow. Because of that, it now asks to move the mouse to specific coordinates to start the browser.

{
    "id": "msg_bdrk_01WjPCKnd2LCvVeiV6wJ4mm3",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-5-sonnet-20241022",
    "content": [
        {
            "type": "text",
            "text": "I'll help you search for a hotel in Rome. I see Firefox browser on the desktop, so I'll use that to access a travel website.",
        },
        {
            "type": "tool_use",
            "id": "toolu_bdrk_01CgfQ2bmQsPFMaqxXtYuyiJ",
            "name": "computer",
            "input": {"action": "mouse_move", "coordinate": [35, 65]},
        },
    ],
    "stop_reason": "tool_use",
    "stop_sequence": None,
    "usage": {"input_tokens": 3443, "output_tokens": 106},
}

This is just the first step. As with usual tool use requests, the script should reply with the result of using the tool (moving the mouse in this case). Based on the initial request to book a hotel, there would be a loop of tool use interactions that will ask to click on the icon, type a URL in the browser, and so on until the hotel has been booked.

A more complete example is available in this repository shared by Anthropic.

Things to know
The upgraded Claude 3.5 Sonnet is available today in Amazon Bedrock in the US West (Oregon) AWS Region and is offered at the same cost as the original Claude 3.5 Sonnet. For up-to-date information on regional availability, refer to the Amazon Bedrock documentation. For detailed cost information for each Claude model, visit the Amazon Bedrock pricing page.

In addition to the greater intelligence of the upgraded model, software developers can now integrate computer use (available in public beta) in their applications to automate complex desktop workflows, enhance software testing processes, and create more sophisticated AI-powered applications.

Claude 3.5 Haiku will be released in the coming weeks, initially as a text-only model and later with image input.

You can see how computer use can help with coding in this video with Alex Albert, Head of Developer Relations at Anthropic.

This other video describes computer use for automating operations.

To learn more about these new features, visit the Claude models section of the Amazon Bedrock documentation. Give the upgraded Claude 3.5 Sonnet a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock. You can find deep-dive technical content and discover how our Builder communities are using Amazon Bedrock at community.aws. Let us know what you build with these new capabilities!

– Danilo

Beelink SER9 AMD Ryzen AI 9 HX 370 Mini PC Review

2024-10-22 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/beelink-ser9-amd-ryzen-ai-9-hx-370-mini-pc-review/

In our Beelink SER9 review, we see how this AMD Ryzen AI 9 HX 370 mini PC fares as the fastest AMD mini PC out there with caveats of course

The post Beelink SER9 AMD Ryzen AI 9 HX 370 Mini PC Review appeared first on ServeTheHome.

Introducing an enhanced in-console editing experience for AWS Lambda

2024-10-22 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-an-enhanced-in-console-editing-experience-for-aws-lambda/

AWS Lambda is introducing a new code editing experience in the AWS console based on the popular Code-OSS, Visual Studio Code Open Source code editor. This brings the familiar Visual Studio Code interface and many of the features directly into the Lambda console, allowing developers to use their preferred coding environment and tools in the cloud. The Lambda Code Editor displays larger function package sizes and also integrates with Amazon Q Developer. This is an AI-powered coding assistant that provides real-time suggestions and insights to help you write, understand, and troubleshoot your Lambda functions more efficiently.

Overview

Visual Studio Code is the most popular IDE among developers according to the 2023 Stack Overflow Developer Survey. Integrating Code-OSS into the Lambda Console brings a familiar, accessible, and customizable interface to the in-browser code editing capabilities. This provides a coding experience that is substantially similar to working with function code locally. You can install selected extensions, apply preferred themes and settings, and use your familiar keyboard shortcuts and coding preferences.

The new editing experience is included as part of the standard Lambda service, at no extra cost.

Accessibility

The update also addresses important accessibility needs. With features like high color contrast, keyboard-only navigation, and screen reader support, the Code-OSS integration ensures an inclusive and accessible coding experience for all developers.

Differences from Visual Studio Code IDE

The Lambda console’s Code-OSS integration complements, rather than fully replaces, local development workflows. You can view and edit function code that uses an interpreted language, not compiled languages, which is consistent with the previous Lambda console. The terminal window is also unavailable in Code-OSS.

AWS Toolkit for Visual Studio Code extensions

Deeper integration with the AWS Toolkit for VS Code extension provides access to a subset of AWS specific functionality, including Q Developer. This ensures that the Lambda code editing experience benefits from additional developer tooling enhancements provided through the AWS Toolkit.

Larger package sizes

With Lambda, the total package size for ZIP-based functions, including code and libraries, cannot exceed 50 MB. Previously Lambda imposed a 3MB limit for editing code in the console. Now you are able to view function package sizes up to 50 MB in the console, however, there is still a single file limit of 3 MB. This allows you to view function code even when you have larger dependencies.

Using the new features

Viewing code

To experience the new Lambda Code Editor, log into the AWS Management Console and navigate to the Lambda service. Create a new function or edit an existing one. The new Lambda Code Editor is ready to use, with no additional setup required.

This example shows editing an existing function, viewing the function code in the familiar Code-OSS editor.

Viewing function code in the Lambda Code Editor

Previously, the code was not viewable as the code package size was greater than 3 MB. The update allows you to view larger files. The following image shows a package size of 13.3 MB and the Code-OSS editor allows editing of the function handler.

Viewing larger package size

Environment variables

In the left pane, the environment variables are viewable for the function. Select the pencil icon to edit, add, and remove environment variables.

Viewing and editing environment variables

Creating test events

The new split-screen view allows you to test your function and see your code and test results side-by-side, simplifying test event configuration.

Select Create test event to open the panel.

Creating test event

You can create Private test events or Shareable test events for other builders to use with access to the account.

Generate an event using an event template for the Amazon API Gateway HTTP API event trigger that the function uses. Save the test event.

Creating API Gateway test event

Invoke function

Invoke the function by selecting the Invoke button

The function results appear in the Output panel, consistent with the local VS Code IDE experience.

Function invoke result

The function logs appear below the output.

Viewing function logs

This view allows you to view and edit your code, generate and use test events, and invoke your function, all visible within the familiar Lambda Code Editor interface.

Live Tail Logs

Lambda now natively supports Amazon CloudWatch Logs Live Tail. This is an interactive log streaming and analytics capability, which allows you to view and analyze your Lambda function logs in real time.

Select the Run and Debug icon in the Activity Bar on the left-hand side of the code editor in the Code tab.
Select Open CloudWatch Live Tail. This opens the CloudWatch Logs Live Tail bottom drawer.
Select Start to start a Live Tail session and view your Lambda function logs stream in real time.
Alternatively, navigate to the Test tab and select CloudWatch Logs Live Tail to start a Live Tail session.

CloudWatch Logs Live Tail

Keyboard shortcuts

In the left pane Extensions dialog, you can see the keyboard shortcuts are installed by default.

Viewing installed extensions

Select the Manage gear icon which shows which aspects are configurable.

Viewing configuration options

The Keyboard shortcuts dialog allows you to view and change the shortcuts.

Amending keyboard shortcuts

Command Palette

Viewing the Command Palette shows available commands.

Viewing Command Palette

Configuration settings

The Settings panel allows you to configure the Lambda Code Editor to match your local IDE environment if required.

Viewing Settings panel

Navigate to Themes | Color Themes to customize the theme, including dark mode.

Lambda Console Editor dark mode

Downloading function code and template

It is now easier to download the function code and an AWS Serverless Application Model (AWS SAM) template which represents the Cloudformation resources required to set up the function, policies, and triggers. This allows you to start in the console and more easily move to using infrastructure as code, which is a serverless best practice.

Navigate to the Activity Bar Run and Debug section.
Select Download code and SAM template.
Extract the .zip file and open the folder in your local VS Code IDE.

You can continue to edit the function in your local IDE experience, which is consistent with the Lambda Console Editor.

Local VS Code IDE to continue working on function

Using your local IDE terminal or AWS Toolkit for VS Code, you can update the existing function. You can also use AWS SAM functionality to build and deploy the template as a Cloudformation stack to the cloud.

Using Amazon Q

The Amazon Q Developer AI assistant integrates directly into the code editor. This reduces the need to consult external documentation or tutorials, streamlining your development workflow.

Amazon Q provides inline suggestions or by using keyboard shortcuts for common actions you take, such as initiating Amazon Q or accepting a recommendation.

This example below adds more functionality to a new Lambda function to download an object from S3 with the help of Amazon Q. Enter a comment explaining the functionality you need.

Asking Amazon Q a question

Select tab to accept the suggestion.

Accepting an Amazon Q suggestion

You can continue to invoke Q manually to keep adding more code suggestions.

Continue adding functionality with Amazon Q

Conclusion

Lambda is introducing a new AWS console code editing experience based on the popular Code-OSS, Visual Studio Code Open Source code editor. This brings the familiar VS Code IDE interface and features directly into the Lambda console so you can use your preferred coding environment and tools in the cloud. Invoke your function using a new split-screen view to see your code and test results side-by-side, simplifying test event configuration.

The code editor displays larger function package sizes, makes environment variables more visible, and also integrates with Amazon Q Developer. This provides real-time suggestions and insights to help you write, understand, and troubleshoot your Lambda functions more efficiently.

For more serverless learning resources, visit Serverless Land.

Peplink SpeedFusion: Bond Starlink with this Internet Game-Changer!

2024-10-22 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/watch?v=g7-44SOtEXw

Introducing AlmaLinux OS Kitten (AlmaLinux Blog)

2024-10-22 jzb

Post Syndicated from jzb original https://lwn.net/Articles/995140/

The AlmaLinux project has introduced a new edition called “Kitten”,
which will serve as “the direct upstream for AlmaLinux OS and is
the primary point for the AlmaLinux community to engage and influence
the future of AlmaLinux OS“. Not intended for production use, the
first release is based on CentOS Stream 10 source, which
will eventually be the basis for Red Hat Enterprise Linux (RHEL)
10:

Because we anticipated many changes in 10, we wanted to get a head
start on building AlmaLinux OS 10. Earlier this year we started
setting up infrastructure and the build pipeline for AlmaLinux OS 10,
and started testing using CentOS Stream 10’s code. Based on this
preparation work, we are excited to share that we have successfully
built a preview of AlmaLinux OS 10 that we are calling AlmaLinux OS
Kitten 10.

The first Kitten release previews a number of ways that AlmaLinux will
diverge from RHEL 10, including re-enabling frame pointers,
including Simple Protocol for Independent Computing Environments
(SPICE), and adding packages for Firefox and Thunderbird, which have
been dropped from CentOS Stream 10 in favor of Flatpak versions. New
installation images for Kitten will be built quarterly. See the release
notes for download links, installation instructions, and more
information.

AI 101: Classification vs. Predictive vs. Generative AI

2024-10-22 Stephanie Doyle

Post Syndicated from Stephanie Doyle original https://www.backblaze.com/blog/ai-101-classification-vs-predictive-vs-generative-ai/

A decorative image showing several buildings with digital lines flowing upward into a cloud.

It may seem like generative AI is the only game in town, or at least the only AI model worth paying attention to. But folks have been using AI models to do all kinds of things for years before ChatGPT, Claude, and Gemini came on the scene.

Today, I’m talking about the three different broadly defined categories of AI—classification, predictive, and generative—and what they’re good for.

Classification vs. Predictive vs. Generative AI Models: What’s the Diff?

Classification and predictive models have been foundational to AI for decades, powering applications like spam filters, cyber security tools, big data analysis, and demand forecasting. However, with recent advances, generative models like GPT and DALL-E have taken the spotlight, bringing up interesting existential (and legal) questions about the nature of creativity and creative work going forward. Understanding the distinctions and history of these models is key to grasping how AI continues to shape industries and innovation today.

Let’s see which category best applies to your particular problem.

AI classification models

A classification model is built to recognize, understand, and group data into preset categories. The model is fully trained using the training data and then evaluated using test data before being used to respond to unseen data. In general, such models infer answers for the current moment in time, for example, deciding whether an email is spam or phishing. In that case, the decision is based on comparing the incoming email to a model trained on previously classified email messages, both ones that the user has set or ones that the platform has. (The two are related, of course, as the platform’s filters often update to include aggregate user data.)

In business, classification models drive applications like spam detection, customer segmentation, and fraud detection. Healthcare uses classification models to diagnose diseases based on medical images or patient data. In finance, they help identify high-risk transactions. Social media platforms rely on these models to filter content, detect hate speech, and recommend posts. Overall, classification models are key to organizing large datasets efficiently and making decisions based on patterns, helping automate and optimize numerous industry processes.

AI prediction models

Predictive AI models utilize historical data, patterns, and trends to train the model, so they can be used to make informed decisions about future events or outcomes. Using Drive Stats as an example, we could theoretically build a model that, when given data about a particular drive model and failure rates, predicts the chance that a given hard drive will fail in the next 90 days. Predictive AI models typically require large amounts of data to be trained and are computationally expensive to generate.

Predicting Hard Drive Failure Rates with AI

Okay, we were being coy when we said “example.” Check out Andy Klein’s Tech Day 2024 presentation, “Predicting Hard Drive Failure Rates with AI” to see how this kind of predictive model works.

AI prediction models help predict customer behavior, sales trends, and demand, aiding in decision making and resource planning. In finance, these models are crucial for stock price forecasting, risk assessment, and credit scoring. Healthcare utilizes prediction models for patient outcome predictions, disease progression, and treatment effectiveness. They are also applied in weather forecasting, supply chain optimization, and energy usage management. By analyzing past data, prediction models provide insights that help organizations anticipate trends, make proactive decisions, and optimize performance across various industries.

Generative AI models

You know this one. Generative AI is about creating (sort of) new content. It uses neural networking, deep learning, and other techniques to infer and generate content that is based on patterns it observes in existing content all while mimicking the style and structure as requested. Image generators such as DALL-E and Stable Diffusion, and large language models like ChatGPT, Claude, and Gemini are easily accessible AI applications which have brought AI into the public eye.

Generative AI is at turns the thing that will revolutionize everything, a scary specter with near-sentience that will steal your job, or a big hallucinating fluke that tells you to put glue on pizza. There are some pretty cool use cases—for one, researchers are using generative AI for new drug discovery. But you’re most likely to run into generative AI in the following use cases: customer service chatbots, coding assistants, marketing support, and general business assistants that generate transcripts and summaries.

Unlocking the power of AI

Even with all the current hype around generative AI we are still in the early stages of development when it comes to AI systems given they are most useful in responding to queries based on the subject matter with which they were trained.

For example, an AI model trained to play chess might find playing checkers to be difficult. While the board, and number of players are the same, can a chess-playing AI model infer the allowed checker moves based on its understanding of chess? Even generative AI models like ChatGPT which are trained on a wide variety of subjects are still lacking a key ingredient to be truly useful to your organization: your data.

An AI chatbot, for example, isn’t going to perform the way you want it to without being powered by your organization’s data. And, how do you build an AI powered tool while keeping your private data private? We started to explore that very question in a recent webinar, “Leveraging your Cloud Storage Data in AI/ML Apps and Services.”

Tune in to learn more about the various ways AI/ML applications use and store data and get insights from our customers who leverage Backblaze B2 Cloud Object Storage for their AI/ML needs.

The post AI 101: Classification vs. Predictive vs. Generative AI appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Remote SLZB-06 Zigbee coordinator via WireGuard in Home Assistant

2024-10-22 BeardedTinker

Post Syndicated from BeardedTinker original https://www.youtube.com/watch?v=l3L2KrN62Xw

Another five stable kernels

2024-10-22 corbet

Post Syndicated from corbet original https://lwn.net/Articles/995130/

The
6.11.5,
6.6.58,
6.1.114,
5.15.169, and
5.10.228
stable kernels have all been released; each contains another set of
important fixes.

The Perfect Portrait Photography Workflow (on the go!)

2024-10-22 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=afAymRa2dfE

OpenSSL 3.4.0 released

2024-10-22 corbet

Post Syndicated from corbet original https://lwn.net/Articles/995098/

Version 3.4.0 of the OpenSSL SSL/TLS library has been released. It adds a
number of new encryption algorithms, support for “directly fetched
composite signature algorithms such as RSA-SHA2-256“, and more. See the
release notes for details.

Security updates for Tuesday

2024-10-22 corbet

Post Syndicated from corbet original https://lwn.net/Articles/995095/

Security updates have been issued by Debian (ffmpeg, ghostscript, libsepol, openjdk-11, openjdk-17, perl, and python-sql), Oracle (389-ds-base, buildah, containernetworking-plugins, edk2, httpd, java-1.8.0-openjdk, java-11-openjdk, java-17-openjdk, java-21-openjdk, kernel, python-setuptools, skopeo, and webkit2gtk3), Red Hat (buildah), Slackware (openssl), SUSE (apache2, firefox, libopenssl-3-devel, podman, and python310-starlette), and Ubuntu (cups-browsed, firefox, libgsf, and linux-gke).

Джендърът и раждаемостта – фалшивата връзка

2024-10-22

Post Syndicated from original https://www.toest.bg/dzendurut-i-razhdaemostta-falshivata-vruzka/

Джендърът и раждаемостта – фалшивата връзка

Един от популярните аргументи срещу приемането на ЛГБТ общността е за ниската раждаемост. При скорошното гласуване на поправките в Закона за училищното и предучилищното образование, включващи забрана на пропагандата на нетрадиционна сексуалност в училищата, сред доводите в подкрепа намери място влошаващата се демографска картина в страната. Преди известно време адвокат Владимир Шейтанов във връзка с подписка срещу „джендър идеологията“ посочи, че една от причините да е против въпросната идеология е демографската картина в страната.

И в други държави се чуват подобни тези – че намаляването на раждаемостта, което се наблюдава в глобален план през последните десетилетия, особено в западния свят, се дължи на увеличаващите се права на жените и на представителите на ЛГБТ общността. Заради тях все по-малко хора имат деца. По този начин коефициентът на плодовитост (средният брой деца, родени от една жена) пада до под 2,1 промила – числото, смятано за необходимо, ако се търси поне неутрален прираст на населението. Тази теза отеква с особена сила в България, считана за страна с бързо намаляващо население.

Дали наистина причините са в повечето права на жените и ЛГБТ хората?

На пръв поглед този аргумент звучи убедително. В страни с прекалено висок прираст се отправят препоръки за въвеждане на образование за момичетата като начин да се ограничи раждаемостта. Също така е факт, че след бейбибума през 50-те години на миналия век раждаемостта в западните държави започва да спада след бурното десетилетие на 60-те, когато традиционната култура бива поставена под въпрос и започват процеси за сексуално освобождаване, включващо и повече права за хората с различна ориентация или полова идентичност.

Отношението към гей хората, което остава недобро дори след разгрома на нацистка Германия в края на Втората световна война, започва да се променя, макар и бавно. Тепърва на общността предстоят тежки изпитания, като пандемията от СПИН в края на 80-те, когато администрацията на президента Рейгън в САЩ се отнася с присмех към ставащото.

При все това тезата, която се извежда, е, че промяната на отношението към секса е подействала така, че хората да правят любов само за удоволствие, забравяйки дълга си към обществото – да създават деца и така да подсигурят бъдещето на социалните системи в държавите, които обитават.

От етическа гледна точка този аргумент е, меко казано, спорен. Ако бъде изведен до своята крайност, хората, които го изтъкват, възприемат човешките същества като добитък за разплод, чиято задача е да осигурява нови бурмички за системата. Те игнорират желанието на много хомосексуални хора да имат свои деца чрез осиновяване или сурогатно майчинство, като същевременно заклеймяват това и настояват ЛГБТ хората да влязат в хетеросексуален брак, в който да се възпроизведат физически, дори с цената на психичното си здраве и личното си нещастие.

Учудващо е, че през последните години такава гледна точка се възприема като дясна, тъй като тя е дълбоко колективистична. Освен това подобна теза жертва правата на индивида в името на общността, на чиито интереси той трябва да бъде изцяло подчинен. Такова мислене е по-характерно за тоталитарните режими от миналото и затова не е учудващо, че както нацизмът, така и сталинският комунизъм недолюбват хомосексуалността въпреки първоначалната разкрепостеност на болшевиките под ръководството на Ленин.

Авторитаризъм, репресии и логически грешки

В днешно време картината не е по-различна, като един от най-сигурните белези за наличието на авторитарен или диктаторски режим е репресията на различните. В момента най-известният пример е режимът на Владимир Путин в Русия, който превърна хомофобията в свое кредо по начин, напомнящ антисемитизма на нацистите. Но подобни са и политиките на Си Дзинпин, забранил появата на „женствени мъже“ в китайските филми и сериали, на Реджеп Тайип Ердоган, гордеещ се, че сред избирателите на неговата „Партия за справедливост и развитие“ няма ЛГБТ хора (бомбастично и съмнително твърдение, в интерес на истината), на печално известния Виктор Орбан, който се бори с „ЛГБТК офанзивата“ на Европейския съюз, на севернокорейските комунисти, смятащи, че еднополовите отношения „обиждат“ лидера им Ким Чен Ун, на различните ислямистки режими, директно отнемащи живота на хората, имали нещастието да се покорят на хомосексуалната природа, с която са родени.

Целта на този материал обаче не е да разобличава моралната несъстоятелност на тезата, че хомосексуалните трябва да подчинят живота си на необходимостта от биологично възпроизводство чрез лицемерен брак с лице от срещуположния пол. (Между другото, сексуалната ориентация, доказано от учени и лекари, не е волева.) Самата аргументация, че увеличението на правата на ЛГБТ хората води до намаляване на раждаемостта, е крайно съмнителна и не издържа проверката на фактите. Тук се прави честата логическа грешка да се открива каузалност там, където се наблюдава просто корелация. И това всъщност лесно може да се провери, стига човек да е честен и да изхожда от върховния научен принцип, че твърдение се прави чак след като са проверени фактите, а не се намират доказателства за предварително зададена теза – капан както на религиозното, така и на конспиративното мислене, които неслучайно често са свързани с хомофобни тропи.

Илюстрация на тази логическа грешка е популярният виц за човека, който се разболял от цироза и обвинил леда, тъй като пиел най-различни питиета – уиски, водка, вино – но винаги с лед. По същия начин хората, които твърдят, че повечето права на ЛГБТ общността водят до спад на раждаемостта, цитират демографската картина в западните държави, където тези права са установени и наложени, но игнорират останалите и по този начин получават съвсем изкривена картина на реалната ситуация.

Картината

В действителност, ако се гледа картината в рамките на Европейския съюз, страните с по-ниска раждаемост са в Източна Европа, където ЛГБТ правата като цяло не са на почит, докато държавите от Западна и Северна Европа се радват на повече деца в едно семейство. Коефициентът на плодовитост в България възлиза на 1,5 промила. В Унгария, смятана за бастион на християнството – 1,6. В Полша, където доскоро имаше зони, свободни от ЛГБТ – 1,3. В Руската федерация, където е сърцето на традиционните ценности – 1,5. В Украйна, където преди войната хората също се отнасяха със скептицизъм към исканията за повече права на ЛГБТ общността – 1,2.

За сравнение, във Франция, където президентът Оланд регистрира еднополовите бракове, раждаемостта възлиза на 1,9, а в Швеция, също известна с толерантността си – 1,6. В Норвегия – 1,5. Тези числа са около или над средното ниво, но и в останалите държави, където има толерантност, стойностите са сравними с тези на „традиционалистите“ източноевропейци – 1,6 за Великобритания, 1,8 за САЩ, 1,6 за Нидерландия, 1,5 за Швейцария. Числата за силно толерантната Испания са особено ниски – 1,3, но същото важи и за не особено приемащата Италия – 1,2.

Пъстрата картина показва, че не може да се установи дори корелация между ЛГБТ правата и ниската раждаемост, камо ли да се твърди, че едното е следствие от другото.

Разбира се, срещу това защитниците на тезата, че равните права на ЛГБТ хората понижават раждаемостта, биха предложили няколко контрааргумента. Единият би бил, че страните в Източна Европа вече са заразени от „бацила на либерализма“ и това води до спад на раждаемостта. Всъщност негативните демографски процеси у нас и в други страни от бившия социалистически блок започват много по-рано, още по времето на комунизма. Освен това в такъв случай колкото по-напреднали са държавите в посока равноправие, толкова по-ниска раждаемост би трябвало да имат, а такова нещо не се наблюдава. Напротив, в България раждаемостта е по-ниска от по-либералната Франция, но практически идентична с тази в по-консервативната Русия.

Вторият контрааргумент е, че раждаемостта в страни като Швеция и Франция се изкривява от мигрантите, които са социално консервативни. Тази теза звучи по-издържано, тъй като действително и в двете държави се наблюдават капсуловани общности, където властват порядките на консервативния ислям. Всъщност една от дилемите пред борците за човешки права днес е спорът дали колективните права на една общност да живее според традициите си имат преимущество пред правата на индивидите в тази общност, които може да не са съгласни с такъв начин на живот.

Все пак обаче трябва да се отбележи, че и в страните от Източна Европа има представители на малцинствени групи (например ромите у нас), така че това не е запазена марка само на западните общества. Следва да се имат предвид и процесите в самите мюсюлмански държави, защото те всъщност поставят най-сериозно под въпрос твърденията, че ЛГБТ правата идват за сметка на положителния прираст на населението.

Кралство Саудитска Арабия е държава, в която хомосексуалните хора нямат права – там различната сексуална ориентация се наказва със смърт. Коефициентът на плодовитост в държавата е почти идентичен с това, което се наблюдава в западния свят – 1,8. И все пак посоката е право надолу, като се има предвид, че през 1960 г. едно саудитско семейство е имало средно над 7 деца. Абсурдно е да се смята, че джендър идеологията е виновна за такъв спад, тъй като в Саудитска Арабия дори жените имат изключително ограничени граждански права.

Катар, където хомосексуалността се наказва със затвор, коефициентът на плодовитост е 1,9 – под необходимото за запазване на положителния прираст, а в Обединените арабски емирства – 1,6, малко по-висок от коефициента в България. Ислямска република Иран, която екзекутира гей гражданите си, е с коефициент 1,9 при близо 6 деца средно на семейство през 60-те, което показва, че след радикалноконсервативната революция на аятолах Хомейни, обезправила жени и хомосексуални, плодовитостта всъщност е спаднала.

Най-ниска е стойността на коефициента в Далечния изток: 1,1 за Южна Корея, където приемането на ЛГБТ хората е слабо; 1,5 за Китай, където лансирането на нетрадиционни сексуални отношения в медиите се възпира от управляващите; 1,4 за Япония, където хората все пак са малко по-толерантни. Дори консервативна Малайзия не блести с коефициента си 1,7, малко под Северна Корея (1,8). Мюсюлманска Индонезия, където в някои щати бият гей хората с бастуни, е под ръба – с 1,9. Индия, където сексът между хора от един и същи пол доскоро беше забранен, е с 2,03.

Във всички тези страни посоката в коефициента на фертилност спрямо 60-те години е право надолу – независимо дали подкрепят ЛГБТ правата, или не. Това сочи, че причините за спада на раждаемостта не се крият в степента на приемане на различните и свързването на двете неща е форма на логическа манипулация. Манипулация, която заинтересовани от разделение страни, изглежда, правят съзнателно.

„Значи е урбанизацията!“

Преди няколко години, в разгара на пандемията от COVID-19, присъствах на семинар, където стана дума за правата на ЛГБТ хората. Колега магистрант, за когото по-късно научих, че симпатизира на една от партиите в патриотичния спектър у нас, изказа тезата, че не можем да си позволим да насърчаваме ЛГБТ правата предвид ниската раждаемост у нас. Веднага направих упражнение, в което проверихме със сходни на цитираните по-горе данни от Световната банка, намиращи се лесно с Google, като акцентирах на примерите с Иран и Саудитска Арабия. Човекът веднага се изпусна:

Значи е урбанизацията!

И да, изглежда, че реалната причина за демографския спад по света се корени в преместването на населението от селата към големите градски центрове, където чисто пространствено е предизвикателно да имаш многолюдна челяд. Доказателство за това е, че единствените страни с много висок прираст се намират в Африка, Централна Америка и някои части на Азия, където голяма част от населението все още живее на село.

Всичко това води до извода, че ако България действително мисли да намери решение на демографския си проблем, политиките трябва да се насочат към насърчаването на виртуални офиси, позволяващи на хората да работят от дома си и да се местят в по-малки населени места; към преференциални данъчни ставки за бизнеси, откриващи работни места в села и по-малки градове; към гарантиране на свободно време, което да позволи на младите хора да водят пълноценен романтичен и сексуален живот с партньорите си; а също и към повече приемане на различните и разкриване на възможности те да създават свои семейства тук, в България, а не в Западна Европа, където много от тях емигрират, за да получат достоен живот.

Разбира се, много по-лесно е да се намерят врагове като „джендърите“ и „геите“, който да бъдат нарочени за виновни и да се мачкат – за това не се изисква нито особена изобретателност, нито кой знае какъв интелект. Само че след, да кажем, десет години ще видим, че просто сме изгонили едни хора от България, раждаемостта се е влошила допълнително, а проблемът с обезлюдяването се е обострил.

Тогава може би политтехнолозите на омразата ще изпитат хлад в стомаха, осъзнавайки какво са направили – загубили са жизненоважно време, което страната е трябвало да употреби за реформи и създаване на условия за по-добър живот за всички свои граждани.

А може би просто няма да им пука и ще си харчат някъде копейките със здраве. В крайна сметка още Христо Ботев го е написал:

Патриот е – душа дава […]
само, знайте, за парата,
като човек – що да прави?
продава си и душата.

How to use interface VPC endpoints to meet your security objectives

2024-10-22 Joaquin Manuel Rinaudo

Post Syndicated from Joaquin Manuel Rinaudo original https://aws.amazon.com/blogs/security/how-to-use-interface-vpc-endpoints-to-meet-your-security-objectives/

Amazon Virtual Private Cloud (Amazon VPC) endpoints—powered by AWS PrivateLink—enable customers to establish private connectivity to supported AWS services, enterprise services, and third-party services by using private IP addresses. There are three types of VPC endpoints: interface endpoints, Gateway Load Balancer endpoints, and gateway endpoints. An interface VPC endpoint, in particular, allows customers to design applications that connect to AWS services privately, including the more than 130 AWS services that are available over AWS PrivateLink. For a complete list of services that integrate with PrivateLink, see the documentation for VPC endpoints.

The decision regarding when to use interface VPC endpoints to further secure your AWS infrastructure depends on your need for additional security controls or your preferred architecture patterns. In this blog post, we present four security objectives that VPC endpoints help you achieve. It’s important to note that other non-security benefits, such as reduced latency and cost management, are not covered in this post. For more information on those benefits, see these topics:

Improve the latency of sensitive applications with Amazon SageMaker real-time inference endpoints
Reduce network costs by centralizing access to AWS services through AWS Transit Gateway

Background

By default, network packets that originate in the AWS network with a destination on the AWS network (for example, public endpoints for AWS services) stay in the AWS network, except traffic to and from AWS China Regions. In addition, all data flowing across the AWS global network that interconnects AWS data centers and Regions is automatically encrypted at the physical layer before it leaves AWS secured facilities.

AWS PrivateLink VPC endpoints enable customers to further enhance the security posture of their applications by establishing private connectivity to supported AWS services, enterprise services, and third-party services by using a private IP address. You can find patterns for how to use the different types of endpoints in the Securely Access Services Over AWS PrivateLink whitepaper.

An interface VPC endpoint is a collection of one or more elastic network interfaces with private IP addresses. These endpoints can serve as an entry point for traffic destined to a supported AWS service in the same AWS Region as the VPC, without requiring an internet gateway, NAT device, VPN connection, AWS Direct Connect connection, or a public IP. Customers can then use interface VPC endpoints to help meet multiple security objectives, such as the following:

Implement networks that are isolated from the internet
Implement a data perimeter by using VPC endpoint policies
Enable private connectivity to AWS service API endpoints for on-premises environments
Align with specific compliance requirements

In the rest of this post, we’ll discuss each of these objectives in detail and how interface VPC endpoints can help you implement them.

Security objective 1: Implementing networks that are isolated from the internet

If you operate sensitive workloads, you might require that they run in private subnets that are isolated from the internet. In this scenario, the subnets in the network don’t have routes to an internet gateway and won’t be able to either send packets to the internet or receive packets from it.

In this case, you can use interface VPC endpoints to connect your VPC to AWS services in the same Region as if they were in your VPC, without configuring an internet gateway, NAT instance, or route tables. For information on how to configure a cross-Region VPC interface endpoint by using VPC peering, see this guidance.

Figure 1 shows an example architecture with an Amazon Elastic Compute Cloud (Amazon EC2) instance running in an isolated network and using interface VPC endpoints to send messages to Amazon Simple Queue Service (Amazon SQS).

Figure 1: Isolated subnet for EC2 server sending messages to Amazon SQS

Security objective 2: Implement a data perimeter using VPC endpoint policies

A data perimeter is a set of guardrails to help ensure that only your trusted identities are accessing trusted resources from expected networks. Learn more about data perimeters on AWS.

You can use VPC endpoint policies to implement a data perimeter by allowing access to only trusted entities and resources from your network, helping to prevent unintended access. This enables you to take advantage of the power of AWS Identity and Access Management (IAM) policy and flexibility to control access to your resources at a granular level.

In the VPC diagram in Figure 2, EC2 instance traffic flows out through a firewall endpoint, NAT gateway, and internet gateway to reach the S3 public API endpoint, remaining within the AWS network. However, this setup does not allow the implementation of a logical data perimeter to control the specific resources that the EC2 instance can access.

Figure 2: Before implementing a data perimeter

In contrast, in the diagram in Figure 3, you can see how VPC interface endpoints enable the use of VPC endpoint policies to enforce a data perimeter, such as only allowing certain S3 buckets to be accessed by the EC2 instance.

Figure 3: After implementing a data perimeter

For example, you can attach a policy, similar to the one below, to an Amazon S3 interface or gateway VPC endpoint to restrict access from the VPC to only S3 buckets that are owned by the same AWS Organizations organization. Make sure to replace <MY-ORG-ID> with your own information.

{
    "Version": "2012-10-17",
        "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": "*",
            "Condition": { "StringEquals": { "aws:ResourceOrgID": <MY-ORG-ID>}}
        }
    ]
}

As a further example, the following policy shows how you can limit access to only trusted identities. You can attach this policy to an S3 interface VPC endpoint to permit access only to principals from your organization, to help mitigate the risk of unintended disclosure through non-corporate credentials. Make sure to replace <MY-ORG-ID> with your own information.

{    "Version": "2012-10-17",
        "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": "*",
            "Condition": { "StringEquals": { "aws:PrincipalOrgID": "<MY-ORG-ID>"}}
        }
    ]
}

Finally, you can create resource policies for your resources to restrict access to only VPC interface endpoints. For example, you can use the following policy from our Amazon S3 User Guide for S3 buckets. Make sure to replace <MY-VPCE-ID> and <MY-BUCKET> with your own information.

 {  "Version": "2012-10-17",
        "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": ["arn:aws:s3::<MY-BUCKET>", "arn:aws:s3::<MY-BUCKET>/*"]
            "Condition": { "StringNotEquals": { "aws:SourceVpce": "<MY-VPCE-ID>"}}
        }
    ]
}

For more information on the use of these condition keys and how to implement a data perimeter, see this blog post.

Security objective 3: Enable private connectivity to AWS service API endpoints for on-premises environments

You might be required to run private connectivity to AWS only from your on-premises environments, such as when your on-premises firewalls are configured to limit the connectivity to the internet, including AWS public endpoints.

In this case, you can use interface VPC endpoints with Direct Connect private virtual interfaces (VIFs) or Site-to-Site VPN to extend private connectivity to your on-premises networks. With this setup, you can also enforce data perimeter rules like those shown earlier in this post.

For example, customers can use interface VPC endpoints from Amazon CloudWatch agents running on on-premises servers to CloudWatch through a private connection, as demonstrated in this blog post.

In the diagram in Figure 4, we show how you can extend this approach to include other services, such as Amazon S3, in a single VPC setup. To implement this pattern, you need to set up conditional forwarding on your on-premises DNS resolver to forward queries for amazonaws.com to an Amazon Route 53 Resolver’s inbound endpoint IPs.

The flow in this scenario is as follows:

The DNS query for your S3 endpoint from your on-premises host is routed to the locally configured on-premises DNS server.
The on-premises DNS server performs conditional forwarding to an Amazon Route 53 inbound resolver endpoint IP address.
The inbound resolver returns the IP address of the interface VPC endpoint, which allows the on-premises host to establish private connectivity through AWS VPN or AWS Direct Connect.

Figure 4: On-premises private connectivity to Amazon S3 and Amazon CloudWatch

You can extend this architecture to support a cross-Region and multi-VPC setup by using AWS Transit Gateway and Amazon Route 53 private hosted zones, as described in the Building a Scalable and Secure Multi-VPC AWS Network Infrastructure whitepaper. Keep in mind that a distributed VPC endpoint approach (one that uses one endpoint per VPC) will allow you to implement least-privilege policies in VPC endpoints. A centralized approach, while more cost-effective, can increase the complexity of maintaining least privilege in a single policy and increase the scope of impact of a security event.

Security objective 4: Align with specific compliance requirements

In certain cases, customers operating in industries such as financial services or healthcare need to maintain compliance with regulations or standards such as HIPAA, the EU-US Data Privacy Framework, and PCI DSS. Although all communication between instances and services hosted in AWS use the AWS private network, using an interface VPC endpoint can help prove to auditors that you’re applying a defense-in-depth approach. This approach includes designing your workloads to run in networks that are isolated from the internet or implementing additional conditions such as the example VPC endpoint policies shown earlier in this post.

You can use AWS Audit Manager to get started mapping your compliance requirements to industry and geographic frameworks, such as NIST SP 800-53 Rev. 5, FedRAMP, and PCI DSS, and to automate evidence collection for controls such as the use of VPC endpoints. If you also have custom compliance requirements, you can create your own custom controls by using the Configure Amazon Virtual Private Cloud (VPC) service endpoints core control in the AWS Audit Manager control library console.

If you want to know how the use of VPC endpoints can help you align with compliance requirements for your specific workload and require assistance beyond what is provided in the public documentation on the AWS Compliance Programs webpage, you can consult with AWS Security Assurance Services (AWS SAS). AWS SAS has expert consultants and advisors who can help you design your systems to achieve, maintain, and automate compliance in the cloud.

Conclusion

In this blog post, we presented four security objectives to consider when deciding whether to use AWS interface VPC endpoints. You can use this information when you design your architecture or create a threat model to help implement secure architectures for your AWS hosted workloads. If you want to learn more about AWS PrivateLink and interface endpoints, see the AWS PrivateLink documentation. If you’re interested in learning more about implementing data perimeter concepts by using VPC endpoints, we suggest this workshop.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Building Vectorize, a distributed vector database, on Cloudflare’s Developer Platform

2024-10-22 Jérôme Schneider

Post Syndicated from Jérôme Schneider original https://blog.cloudflare.com/building-vectorize-a-distributed-vector-database-on-cloudflare-developer-platform

Vectorize is a globally distributed vector database that enables you to build full-stack, AI-powered applications with Cloudflare Workers. Vectorize makes querying embeddings — representations of values or objects like text, images, audio that are designed to be consumed by machine learning models and semantic search algorithms — faster, easier and more affordable.

In this post, we dive deep into how we built Vectorize on Cloudflare’s Developer Platform, leveraging Cloudflare’s global network, Cache, Workers, R2, Queues, Durable Objects, and container platform.

What is a vector database?

A vector database is a queryable store of vectors. A vector is a large array of numbers called vector dimensions.

A vector database has a similarity search query: given an input vector, it returns the vectors that are closest according to a specified metric, potentially filtered on their metadata.

Vector databases are used to power semantic search, document classification, and recommendation and anomaly detection, as well as contextualizing answers generated by LLMs (Retrieval Augmented Generation, RAG).

Why do vectors require special database support?

Conventional data structures like B-trees, or binary search trees expect the data they index to be cheap to compare and to follow a one-dimensional linear ordering. They leverage this property of the data to organize it in a way that makes search efficient. Strings, numbers, and booleans are examples of data featuring this property.

Because vectors are high-dimensional, ordering them in a one-dimensional linear fashion is ineffective for similarity search, as the resulting ordering doesn’t capture the proximity of vectors in the high-dimensional space. This phenomenon is often referred to as the curse of dimensionality.

In addition to this, comparing two vectors using distance metrics useful for similarity search is a computationally expensive operation, requiring vector-specific techniques for databases to overcome.

Query processing architecture

Vectorize builds upon Cloudflare’s global network to bring fast vector search close to its users, and relies on many components to do so.

These are the Vectorize components involved in processing vector queries.

Vectorize runs in every Cloudflare data center, on the infrastructure powering Cloudflare Workers. It serves traffic coming from Worker bindings as well as from the Cloudflare REST API through our API Gateway.

Each query is processed on a server in the data center in which it enters, picked in a fashion that spreads the load across all servers of that data center.

The Vectorize DB Service (a Rust binary) running on that server processes the query by reading the data for that index on R2, Cloudflare’s object storage. It does so by reading through Cloudflare’s Cache to speed up I/O operations.

Searching vectors, and indexing them to speed things up

Being a vector database, Vectorize features a similarity search query: given an input vector, it returns the K vectors that are closest according to a specified metric.

Conceptually, this similarity search consists of 3 steps:

Evaluate the proximity of the query vector with every vector present in the index.
Sort the vectors based on their proximity “score”.
Return the top matches.

While this method is accurate and effective, it is computationally expensive and does not scale well to indexes containing millions of vectors (see Why do vectors require special database support? above).

To do better, we need to prune the search space, that is, avoid scanning the entire index for every query.

For this to work, we need to find a way to discard vectors we know are irrelevant for a query, while focusing our efforts on those that might be relevant.

Indexing vectors with IVF

Vectorize prunes the search space for a query using an indexing technique called IVF, Inverted File Index.

IVF clusters the index vectors according to their relative proximity. For each cluster, it then identifies its centroid, the center of gravity of that cluster, a high-dimensional point minimizing the distance with every vector in the cluster.

Once the list of centroids is determined, each centroid is given a number. We then structure the data on storage by placing each vector in a file named like the centroid it is closest to.

When processing a query, we then can then focus on relevant vectors by looking only in the centroid files closest to that query vector, effectively pruning the search space.

Compressing vectors with PQ

Vectorize supports vectors of up to 1536 dimensions. At 4 bytes per dimension (32 bits float), this means up to 6 KB per vector. That’s 6 GB of uncompressed vector data per million vectors that we need to fetch from storage and put in memory.

To process multi-million vector indexes while limiting the CPU, memory, and I/O required to do so, Vectorize uses a dimensionality reduction technique called PQ (Product Quantization). PQ compresses the vectors data in a way that retains most of their specificity while greatly reducing their size — a bit like down sampling a picture to reduce the file size, while still being able to tell precisely what’s in the picture — enabling Vectorize to efficiently perform similarity search on these lighter vectors.

In addition to storing the compressed vectors, their original data is retained on storage as well, and can be requested through the API; the compressed vector data is used only to speed up the search.

Approximate nearest neighbor search and result accuracy refining

By pruning the search space and compressing the vector data, we’ve managed to increase the efficiency of our query operation, but it is now possible to produce a set of matches that is different from the set of true closest matches. We have traded result accuracy for speed by performing an approximate nearest neighbor search, reaching an accuracy of ~80%.

To boost the result accuracy up to over 95%, Vectorize then performs a result refinement pass on the top approximate matches using uncompressed vector data, and returns the best refined matches.

Eventual consistency and snapshot versioning

Whenever you query your Vectorize index, you are guaranteed to receive results which are read from a consistent, immutable snapshot — even as you write to your index concurrently. Writes are applied in strict order of their arrival in our system, and they are funneled into an asynchronous process. We update the index files by reading the old version, making changes, and writing this updated version as a new object in R2. Each index file has its own version number, and can be updated independently of the others. Between two versions of the index we may update hundreds or even thousands of IVF and metadata index files, but even as we update the files, your queries will consistently use the current version until it is time to switch.

Each IVF and metadata index file has its own version. The list of all versioned files which make up the snapshotted version of the index is contained within a manifest file. Each version of the index has its own manifest. When we write a new manifest file based on the previous version, we only need to update references to the index files which were modified; if there are files which weren’t modified, we simply keep the references to the previous version.

We use a root manifest as the authority of the current version of the index. This is the pivot point for changes. The root manifest is a copy of a manifest file from a particular version, which is written to a deterministic location (the root of the R2 bucket for the index). When our async write process has finished processing vectors, and has written all new index files to R2, we commit by overwriting the current root manifest with a copy of the new manifest. PUT operations in R2 are atomic, so this effectively makes our updates atomic. Once the manifest is updated, Vectorize DB Service instances running on our network will pick it up, and use it to serve reads.

Because we keep past versions of index and manifest files, we effectively maintain versioned snapshots of your index. This means we have a straightforward path towards building a point-in-time recovery feature (similar to D1’s Time Travel feature).

You may have noticed that because our write process is asynchronous, this means Vectorize is eventually consistent — that is, there is a delay between the successful completion of a request writing on the index, and finally seeing those updates reflected in queries. This isn’t always ideal for all data storage use cases. For example, imagine two users using an online ticket reservation application for airline tickets, where both users buy the same seat — one user will successfully reserve the ticket, and the other will eventually get an error saying the seat was taken, and they need to choose again. Because a vector index is not typically used as a primary database for these transactional use cases, we decided eventual consistency was a worthy trade off in order to ensure Vectorize queries would be fast, high-throughput, and cheap even as the size of indexes grew into the millions.

Coordinating distributed writes: it’s just another block in the WAL

In the section above, we touched on our eventually consistent, asynchronous write process. Now we’ll dive deeper into our implementation.

The WAL

A write ahead log (WAL) is a common technique for making atomic and durable writes in a database system. Vectorize’s WAL is implemented with SQLite in Durable Objects.

In Vectorize, the payload for each update is given an ID, written to R2, and the ID for that payload is handed to the WAL Durable Object which persists it as a “block.” Because it’s just a pointer to the data, the blocks are lightweight records of each mutation.

Durable Objects (DO) have many benefits — strong transactional guarantees, a novel combination of compute and storage, and a high degree of horizontal scale — but individual DOs are small allotments of memory and compute. However, the process of updating the index for even a single mutation is resource intensive — a single write may include thousands of vectors, which may mean reading and writing thousands of data files stored in R2, and storing a lot of data in memory. This is more than what a single DO can handle.

So we designed the WAL to leverage DO’s strengths and made it a coordinator. It controls the steps of updating the index by delegating the heavy lifting to beefier instances of compute resources (which we call “Executors”), but uses its transactional properties to ensure the steps are done with strong consistency. It safeguards the process from rogue or stalled executors, and ensures the WAL processing continues to move forward. DOs are easy to scale, so we create a new DO instance for each Vectorize index.

WAL Executor

The executors run from a single pool of compute resources, shared by all WALs. We use a simple producer-consumer pattern using Cloudflare Queues. The WAL enqueues a request, and executors poll the queue. When they get a request, they call an API on the WAL requesting to be assigned to the request.

The WAL ensures that one and only one executor is ever assigned to that write. As the executor writes, the index files and the updated manifest are written in R2, but they are not yet visible. The final step is for the executor to call another API on the WAL to commit the change — and this is key — it passes along the updated manifest. The WAL is responsible for overwriting the root manifest with the updated manifest. The root manifest is the pivot point for atomic updates: once written, the change is made visible to Vectorize’s database service, and the updated data will appear in queries.

From the start, we designed this process to account for non-deterministic errors. We focused on enumerating failure modes first, and only moving forward with possible design options after asserting they handled the possibilities for failure. For example, if an executor stalls, the WAL finds a new executor. If the first executor comes back, the coordinator will reject its attempt to commit the update. Even if that first executor is working on an old version which has already been written, and writes new index files and a new manifest to R2, they will not overwrite the files written from the committed version.

Batching updates

Now that we have discussed the general flow, we can circle back to one of our favorite features of the WAL. On the executor, the most time-intensive part of the write process is reading and writing many files from R2. Even with making our reads and writes concurrent to maximize throughput, the cost of updating even thousands of vectors within a single file is dwarfed by the total latency of the network I/O. Therefore it is more efficient to maximize the number of vectors processed in a single execution.

So that is what we do: we batch discrete updates. When the WAL is ready to request work from an executor, it will get a chunk of “blocks” off the WAL, starting with the next un-written block, and maintaining the sequence of blocks. It will write a new “batch” record into the SQLite table, which ties together that sequence of blocks, the version of the index, and the ID of the executor assigned to the batch.

Users can batch multiple vectors to update in a single insert or upsert call. Because the size of each update can vary, the WAL adaptively calculates the optimal size of its batch to increase throughput. The WAL will fit as many upserted vectors as possible into a single batch by counting the number of updates represented by each block. It will batch up to 200,000 vectors at once (a value we arrived at after our own testing) with a limit of 1,000 blocks. With this throughput, we have been able to quickly load millions of vectors into an index (with upserts of 5,000 vectors at a time). Also, the WAL does not pause itself to collect more writes to batch — instead, it begins processing a write as soon as it arrives. Because the WAL only processes one batch at a time, this creates a natural pause in its workflow to batch up writes which arrive in the meantime.

Retraining the index

The WAL also coordinates our process for retraining the index. We occasionally re-train indexes to ensure the mapping of IVF centroids best reflects the current vectors in the index. This maintains the high accuracy of the vector search.

Retraining produces a completely new index. All index files are updated; vectors have been reshuffled across the index space. For this reason, all indexes have a second version stamp — which we call the generation — so that we can differentiate between retrained indexes.

The WAL tracks the state of the index, and controls when the training is started. We have a second pool of processes called “trainers.” The WAL enqueues a request on a queue, then a trainer picks up the request and it begins training.

Training can take a few minutes to complete, but we do not pause writes on the current generation. The WAL will continue to handle writes as normal. But the training runs from a fixed snapshot of the index, and will become out-of-date as the live index gets updated in parallel. Once the trainer has completed, it signals the WAL, which will then start a multi-step process to switch to the new generation. It enters a mode where it will continue to record writes in the WAL, but will stop making those writes visible on the current index. Then it will begin catching up the retrained index with all of the updates that came in since it started. Once it has caught up to all data present in the index when the trainer signaled the WAL, it will switch over to the newly retrained index. This prevents the new index from appearing to “jump back in time.” All subsequent writes will be applied to that new index.

This is all modeled seamlessly with the batch record. Because it associates the index version with a range of WAL blocks, multiple batches can span the same sequence of blocks as long as they belong to different generations. We can say this another way: a single WAL block can be associated with many batches, as long as these batches are in different generations. Conceptually, the batches act as a second WAL layered over the WAL blocks.

Indexing and filtering metadata

Vectorize supports metadata filters on vector similarity queries. This allows a query to focus the vector similarity search on a subset of the index data, yielding matches that would otherwise not have been part of the top results.

For instance, this enables us to query for the best matching vectors for color: “blue” and category: ”robe”.

Conceptually, what needs to happen to process this example query is:

Identify the set of vectors matching color: “blue” by scanning all metadata.
Identify the set of vectors matching category: “robe” by scanning all metadata.
Intersect both sets (boolean AND in the filter) to identify vectors matching both the color and category filter.
Score all vectors in the intersected set, and return the top matches.

While this method works, it doesn’t scale well. For an index with millions of vectors, processing the query that way would be very resource intensive. What’s worse, it prevents us from using our IVF index to identify relevant vector data, forcing us to compute a proximity score on potentially millions of vectors if the filtered set of vectors is large.

To do better, we need to prune the metadata search space by indexing it like we did for the vector data, and find a way to efficiently join the vector sets produced by the metadata index with our IVF vector index.

Indexing metadata with Chunked Sorted List Indexes

Vectorize maintains one metadata index per filterable property. Each filterable metadata property is indexed using a Chunked Sorted List Index.

A Chunked Sorted List Index is a sorted list of all distinct values present in the data for a filterable property, with each value mapped to the set of vector IDs having that value. This enables Vectorize to binary search a value in the metadata index in O(log n) complexity, in other words about as fast as search can be on a large dataset.

Because it can become very large on big indexes, the sorted list is chunked in pieces matching a target weight in KB to keep index state fetches efficient.

A lightweight chunk descriptor list is maintained in the index manifest, keeping track of the list chunks and their lower/upper values. This chunk descriptor list can be binary searched to identify which chunk would contain the searched metadata value.

Once the candidate chunk is identified, Vectorize fetches that chunk from index data and binary searches it to take the set of vector IDs matching a metadata value if found, or an empty set if not found.

We identify the matching vector set this way for every predicate in the metadata filter of the query, then intersect the sets in memory to determine the final set of vectors matched by the filters.

This is just half of the query being processed. We now need to identify the vectors most similar to the query vector, within those matching the metadata filters.

Joining the metadata and vector indexes

A vector similarity query always comes with an input vector. We can rank all centroids of our IVF vector index based on their proximity with that query vector.

The vector set matched by the metadata filters contains for each vector its ID and IVF centroid number.

From this, Vectorize derives the number of vectors matching the query filters per IVF centroid, and determines which and how many top-ranked IVF centroids need to be scanned according to the number of matches the query asks for.

Vectorize then performs the IVF-indexed vector search (see the section Searching Vectors, and indexing them to speed things up above) by considering only the vectors in the filtered metadata vector set while doing so.

Because we’re effectively pruning the vector search space using metadata filters, filtered queries can often be faster than their unfiltered equivalent.

Query performance

The performance of a system is measured in terms of latency and throughput.

Latency is a measure relative to individual queries, evaluating the time it takes for a query to be processed, usually expressed in milliseconds. It is what an end user perceives as the “speed” of the service, so a lower latency is desirable.

Throughput is a measure relative to an index, evaluating the number of queries it can process concurrently over a period of time, usually expressed in requests per second or RPS. It is what enables an application to scale to thousands of simultaneous users, so a higher throughput is desirable.

Vectorize is designed for great index throughput and optimized for low query latency to deliver great performance for demanding applications. Check out our benchmarks.

Query latency optimization

As a distributed database keeping its data state on blob storage, Vectorize’s latency is primarily driven by the fetch of index data, and relies heavily on Cloudflare’s network of caches as well as individual server RAM cache to keep latency low.

Because Vectorize data is snapshot versioned, (see Eventual consistency and snapshot versioning above), each version of the index data is immutable and thus highly cacheable, increasing the latency benefits Vectorize gets from relying on Cloudflare’s cache infrastructure.

To keep the index data lean, Vectorize uses techniques to reduce its weight. In addition to Product Quantization (see Compressing vectors with PQ above), index files use a space-efficient binary format optimized for runtime performance that Vectorize is able to use without parsing, once fetched.

Index data is fragmented in a way that minimizes the amount of data required to process a query. Auxiliary indexes into that data are maintained to limit the amount of fragments to fetch, reducing overfetch by jumping straight to the relevant piece of data on mass storage.

Vectorize boosts all vector proximity computations by leveraging SIMD CPU instructions, and by organizing the vector search in 2 passes, effectively balancing the latency/result accuracy ratio (see Approximate nearest neighbor search and result accuracy refining above).

When used via a Worker binding, each query is processed close to the server serving the worker request, and thus close to the end user, minimizing the network-induced latency between the end user, the Worker application, and Vectorize.

Query throughput

Vectorize runs in every Cloudflare data center, on thousands of servers across the world.

Thanks to the snapshot versioning of every index’s data, every server is simultaneously able to serve the index concurrently, without contention on state.

This means that a Vectorize index elastically scales horizontally with its distributed traffic, providing very high throughput for the most demanding Worker applications.

Increased index size

We are excited that our upgraded version of Vectorize can support a maximum of 5 million vectors, which is a 25x improvement over the limit in beta (200,000 vectors). All the improvements we discussed in this blog post contribute to this increase in vector storage. Improved query performance and throughput comes with this increase in storage as well.

However, 5 million may be constraining for some use cases. We have already heard this feedback. The limit falls out of the constraints of building a brand new globally distributed stateful service, and our desire to iterate fast and make Vectorize generally available so builders can confidently leverage it in their production apps.

We believe builders will be able to leverage Vectorize as their primary vector store, either with a single index or by sharding across multiple indexes. But if this limit is too constraining for you, please let us know. Tell us your use case, and let’s see if we can work together to make Vectorize work for you.

Try it now!

Every developer on a free plan can give Vectorize a try. You can visit our developer documentation to get started.

If you’re looking for inspiration on what to build, see the semantic search tutorial that combines Workers AI and Vectorize for document search, running entirely on Cloudflare. Or an example of how to combine OpenAI and Vectorize to give an LLM more context and dramatically improve the accuracy of its answers.

And if you have questions about how to use Vectorize for our product & engineering teams, or just want to bounce an idea off of other developers building on Workers AI, join the #vectorize and #workers-ai channels on our Developer Discord.

Is this thing on? Using OpenBMC and ACPI power states for reliable server boot

2024-10-22 Nnamdi Ajah

Post Syndicated from Nnamdi Ajah original https://blog.cloudflare.com/how-we-use-openbmc-and-acpi-power-states-to-monitor-the-state-of-our-servers

Introduction

At Cloudflare, we provide a range of services through our global network of servers, located in 330 cities worldwide. When you interact with our long-standing application services, or newer services like Workers AI, you’re in contact with one of our fleet of thousands of servers which support those services.

These servers which provide Cloudflare services are managed by a Baseboard Management Controller (BMC). The BMC is a special purpose processor — different from the Central Processing Unit (CPU) of a server — whose sole purpose is ensuring a smooth operation of the server.

Regardless of the server vendor, each server has this BMC. The BMC runs independently of the CPU and has its own embedded operating system, usually referred to as firmware. At Cloudflare, we customize and deploy a server-specific version of the BMC firmware. The BMC firmware we deploy at Cloudflare is based on the Linux Foundation Project for BMCs, OpenBMC. OpenBMC is an open-sourced firmware stack designed to work across a variety of systems including enterprise, telco, and cloud-scale data centers. The open-source nature of OpenBMC gives us greater flexibility and ownership of this critical server subsystem, instead of the closed nature of proprietary firmware. This gives us transparency (which is important to us as a security company) and allows us faster time to develop custom features/fixes for the BMC firmware that we run on our entire fleet.

In this blog post, we are going to describe how we customized and extended the OpenBMC firmware to better monitor our servers’ boot-up processes to start more reliably and allow better diagnostics in the event that an issue happens during server boot-up.

Server subsystems

Server systems consist of multiple complex subsystems that include the processors, memory, storage, networking, power supply, cooling, etc. When booting up the host of a server system, the power state of each subsystem of the server is changed in an asynchronous manner. This is done so that subsystems can initialize simultaneously, thereby improving the efficiency of the boot process. Though started asynchronously, these subsystems may interact with each other at different points of the boot sequence and rely on handshake/synchronization to exchange information. For example, during boot-up, the UEFI (Universal Extensible Firmware Interface), often referred to as the BIOS, configures the motherboard in a phase known as the Platform Initialization (PI) phase, during which the UEFI collects information from subsystems such as the CPUs, memory, etc. to initialize the motherboard with the right settings.

^{Figure 1: Server Boot Process}

When the power state of the subsystems, handshakes, and synchronization are not properly managed, there may be race conditions that would result in failures during the boot process of the host. Cloudflare experienced some of these boot-related failures while rolling out open source firmware (OpenBMC) to the Baseboard Management Controllers (BMCs) of our servers.

Baseboard Management Controller (BMC) as a manager of the host

A BMC is a specialized microprocessor that is attached to the board of a host (server) to assist with remote management capabilities of the host. Servers usually sit in data centers and are often far away from the administrators, and this creates a challenge to maintain them at scale. This is where a BMC comes in, as the BMC serves as the interface that gives administrators the ability to securely and remotely access the servers and carry out management functions. The BMC does this by exposing various interfaces, including Intelligent Platform Management Interface (IPMI) and Redfish, for distributed management. In addition, the BMC receives data from various sensors/devices (e.g. temperature, power supply) connected to the server, and also the operating parameters of the server, such as the operating system state, and publishes the values on its IPMI and Redfish interfaces.

^{Figure 2: Block diagram of BMC in a server system.}

At Cloudflare, we use the OpenBMC project for our Baseboard Management Controller (BMC).

Below are examples of management functions carried out on a server through the BMC. The interactions in the examples are done over ipmitool, a command line utility for interacting with systems that support IPMI.

# Check the sensor readings of a server remotely (i.e. over a network)
$  ipmitool <some authentication> <bmc ip> sdr
PSU0_CURRENT_IN  | 0.47 Amps         | ok
PSU0_CURRENT_OUT | 6 Amps            | ok
PSU0_FAN_0       | 6962 RPM          | ok
SYS_FAN          | 13034 RPM         | ok
SYS_FAN1         | 11172 RPM         | ok
SYS_FAN2         | 11760 RPM         | ok
CPU_CORE_VR_POUT | 9.03 Watts        | ok
CPU_POWER        | 76.95 Watts       | ok
CPU_SOC_VR_POUT  | 12.98 Watts       | ok
DIMM_1_VR_POUT   | 29.03 Watts       | ok
DIMM_2_VR_POUT   | 27.97 Watts       | ok
CPU_CORE_MOSFET  | 40 degrees C      | ok
CPU_TEMP         | 50 degrees C      | ok
DIMM_MOSFET_1    | 36 degrees C      | ok
DIMM_MOSFET_2    | 39 degrees C      | ok
DIMM_TEMP_A1     | 34 degrees C      | ok
DIMM_TEMP_B1     | 33 degrees C      | ok

…

# check the power status of a server remotely (i.e. over a network)
ipmitool <some authentication> <bmc ip> power status
Chassis Power is off

# power on the server
ipmitool <some authentication> <bmc ip> power on
Chassis Power Control: On

Switching to OpenBMC firmware for our BMCs gives us more control over the software that powers our infrastructure. This has given us more flexibility, customizations, and an overall better uniform experience for managing our servers. Since OpenBMC is open source, we also leverage community fixes while upstreaming some of our own. Some of the advantages we have experienced with OpenBMC include a faster turnaround time to fixing issues, optimizations around thermal cooling, increased power efficiency and supporting AI inference.

While developing Cloudflare’s OpenBMC firmware, however, we ran into a number of boot problems.

Host not booting: When we send a request over IPMI for a host to power on (as in the example above, power on the server), ipmitool would indicate the power status of the host as ON, but we would not see any power going into the CPU nor any activity on the CPU. While ipmitool was correct about the power going into the chassis as ON, we had no information about the power state of the server from ipmitool, and we initially falsely assumed that since the chassis power was on, the rest of the server components should be ON. The System Event Log (SEL), which is responsible for displaying platform-specific events, was not giving us any useful information beyond indicating that the server was in a soft-off state (powered off), working state (operating system is loading and running), or that a “System Restart” of the host was initiated.

# System Event Logs (SEL) showing the various power states of the server
$ ipmitool sel elist | tail -n3
  4d |  Pre-Init  |0000011021| System ACPI Power State ACPI_STATUS | S5_G2: soft-off | Asserted
  4e |  Pre-Init  |0000011022| System ACPI Power State ACPI_STATUS | S0_G0: working | Asserted
  4f |  Pre-Init  |0000011023| System Boot Initiated RESTART_CAUSE | System Restart | Asserted

In the System Event Logs shown above, ACPI is the acronym for Advanced Configuration and Power Interface, a standard for power management on computing systems. In the ACPI soft-off state, the host is powered off (the motherboard is on standby power but CPU/host isn’t powered on); according to the ACPI specifications, this state is called S5_G2. (These states are discussed in more detail below.) In the ACPI working state, the host is booted and in a working state, also known in the ACPI specifications as status S0_G0 (which in our case happened to be false), and the third row indicates the cause of the restart was due to a System Restart. Most of the boot-related SEL events are sent from the UEFI to the BMC. The UEFI has been something of a black box to us, as we rely on our original equipment manufacturers (OEMs) to develop the UEFI firmware for us, and for the generation of servers with this issue, the UEFI firmware did not implement sending the boot progress of the host to the BMC.

One discrepancy we observed was the difference in the power status and the power going into the CPU, which we read with a sensor we call CPU_POWER.

# Check power status
$ ipmitool <some authentication> <bmc ip>  power status
Chassis Power is on

However, checking the power into the CPU shows that the CPU was not receiving any power.

# Check power going into the CPU
$ ipmitool <some authentication> <bmc ip>  sdr | grep CPU_POWER    
CPU_POWER        | 0 Watts           | ok

The CPU_POWER being at 0 watts contradicts all the previous information that the host was powered up and working, when the host was actually completely shut down.

Missing Memory Modules: Our servers would randomly boot up with less memory than expected. Computers can boot up with less memory than installed due to a number of problems, such as a loose connection, hardware problem, or faulty memory. For our case, it happened not to be any of the usual suspects, but instead was due to both the BMC and UEFI trying to simultaneously read from the memory modules, leading to access contentions. Memory modules usually contain a Serial Presence Detect (SPD), which is used by the UEFI to dynamically detect the memory module. This SPD is usually located on an inter-integrated circuit (i2c), which is a low speed, two write protocol for devices to talk to each other. The BMC also reads the temperature of the memory modules via the i2c. When the server is powered on, amongst other hardware initializations, the UEFI also initializes the memory modules that it can detect via their (i.e. each individual memory modules) Serial Presence Detect (SPD), the BMC could also be trying to access the temperature of the memory module at the same time, over the same i2c protocol. This simultaneous attempted read denies one of the parties access. When the UEFI is denied access to the SPD, it thinks the memory module is not available and skips over it. Below is an example of the related i2c-bus contention logs we saw in the journal of the BMC when the host is booting.

kernel: aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. expected 0x00000021, but was 0x00000020

The above logs indicate that the i2c address 1e78a300 (which happens to be connected to the serial presence detect of the memory modules) could not properly handle a signal, known as an interrupt request (irq). When this scenario plays out on the UEFI, the UEFI is unable to detect the memory module.

^{Figure 3: I2C diagram showing I2C interconnection of the server’s memory modules (also known as DIMMs) with the BMC}

DIMM in Figure 3 refers to Dual Inline Memory Module, which is the type of memory module used in servers.

Thermal telemetry: During the boot-up process of some of our servers, some temperature devices, such as the temperature sensors of the memory modules, would show up as failed, thereby causing some of the fans to enter a fail-safe Pulse Width Modulation (PWM) mode. PWM is a technique to encode information delivered to electronic devices by adjusting the frequency of the waveform signal to the device. It is used in this case to control fan speed by adjusting the frequency of the power signal delivered to the fan. When a fan enters a fail-safe mode, PWM is used to set the fan speeds to a preset value, irrespective of what the optimized PWM setting of the fans should be, and this could negatively affect the cooling of the server and power consumption.

Implementing host ACPI state on OpenBMC

In the process of studying the issues we faced relating to the boot-up process of the host, we learned how the power state of the subsystems within the chassis changes. Part of our learnings led us to investigate the Advanced Configuration and Power Interface (ACPI) and how the ACPI state of the host changed during the boot process.

Advanced Configuration and Power Interface (ACPI) is an open industry specification for power management used in desktop, mobile, workstation, and server systems. The ACPI Specification replaces previous power management methodologies such as Advanced Power Management (APM). ACPI provides the advantages of:

Allowing OS-directed power management (OSPM).
Having a standardized and robust interface for power management.
Sending system-level events such as when the server power/sleep buttons are pressed
Hardware and software support, such as a real-time clock (RTC) to schedule the server to wake up from sleep or to reduce the functionality of the CPU based on RTC ticks when there is a loss of power.

From the perspective of power management, ACPI enables an OS-driven conservation of energy by transitioning components which are not in active use to a lower power state, thereby reducing power consumption and contributing to more efficient power management.

The ACPI Specification defines four global “Gx” states, six sleeping “Sx” states, and four “Dx” device power states. These states are defined as follows:

Gx	Name	Sx	Description
G0	Working	S0	The run state. In this state the machine is fully running
G1	Sleeping	S1	A sleep state where the CPU will suspend activity but retain its contexts.
S2	A sleep state where memory contexts are held, but CPU contexts are lost. CPU re-initialization is done by firmware.
S3	A logically deeper sleep state than S2 where CPU re-initialization is done by device. Equates to Suspend to RAM.
S4	A logically deeper sleep state than S3 in which DRAM is context is not maintained and contexts are saved to disk. Can be implemented by either OS or firmware.
G2	Soft off but PSU still supplies power	S5	The soft off state. All activity will stop, and all contexts are lost. The Complex Programmable Logic Device (CPLD) responsible for power-up and power-down sequences of various components e.g. CPU, BMC is on standby power, but the CPU/host is off.
G3	Mechanical off		PSU does not supply power. The system is safe for disassembly.
Dx	Name	Description
D0	Fully powered on	Hardware device is fully functional and operational
D1	Hardware device is partially powered down	Reduced functionality and can be quickly powered back to D0
D2	Hardware device is in a deeper lower power than D1	Much more limited functionality and can only be slowly powered back to D0.
D3	Hardware device is significantly powered down or off	Device is inactive with perhaps only the ability to be powered back on

The states that matter to us are:

S0_G0_D0: often referred to as the working state. Here we know our host system is running just fine.
S2_D2: Memory contexts are held, but CPU context is lost. We usually use this state to know when the host’s UEFI is performing platform firmware initialization.
S5_G2: Often referred to as the soft off state. Here we still have power going into the chassis, however, processor and DRAM context are not maintained, and the operating system power management of the host has no context.

Since the issues we were experiencing were related to the power state changes of the host — when we asked the host to reboot or power on — we needed a way to track the various power state changes of the host as it went from power off to a complete working state. This would give us better management capabilities over the devices that were on the same power domain of the host during the boot process. Fortunately, the OpenBMC community already implemented an ACPI daemon, which we extended to serve our needs. We added an ACPI S2_D2 power state, in which memory contexts are held, but CPU context is lost, to the ACPI daemon running on the BMC to enable us to know when the host’s UEFI is performing firmware initialization, and also set up various management tasks for the different ACPI power states.

An example of a power management task we carry out using the S0_G0_D0 state is to re-export our Voltage Regulator (VR) sensors on S0_G0_D0 state, as shown with the service file below:

cat /lib/systemd/system/Re-export-VR-device.service 
[Unit]
Description=RE Export VR Device Process
Wants=xyz.openbmc_project.EntityManager.service
After=xyz.openbmc_project.EntityManager.service
Conflicts=host-s2-state.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'set -a && source /usr/bin/Re-export-VR-device.sh on'
SyslogIdentifier=Re-export-VR-device.service

[Install]
WantedBy=host-s0-state.target

Having set this up, OpenBMC has a Net Function (ipmiSetACPIState) in phosphor-host-ipmid that is responsible for setting the ACPIState of the host on the BMC. This command is called by the host using the standard ipmi command with the corresponding NetFn=0x06 and Cmd=0x06.

In the event of an immediate power cycle (i.e. host reboots without operating system shutdown), the host is unable to send its S5_G2 state to the BMC. For this case, we created a patch to OpenBMC’s x86-power-control to let the BMC become aware that the host has entered the ACPI S5_G2 state (i.e. soft-off). When the host comes out of the power off state, the UEFI performs the Power On Self Test (POST) and sends the S2_D2 to the BMC, and after the UEFI has loaded the OS on the host, it notifies the BMC by sending the ACPI S0_G0_D0 state.

Fixing the issues

Going back to the boot-up issues we faced, we discovered that they were mostly caused by devices which were in the same power domain of the CPU, interfering with the UEFI/platform firmware initialization phase. Below is a high level description of the fixes we applied.

Servers not booting: After identifying the devices that were interfering with the POST stage of the firmware initialization, we used the host ACPI state to control when we set the appropriate power mode state for those devices so as not to cause POST to fail.

Memory modules missing: During the boot-up process, memory modules (DIMMs) are powered and initialized in S2_D2 ACPI state. During this initialization process, UEFI firmware sends read commands to the Serial Presence Detect (SPD) on the DIMM to retrieve information for DIMM enumeration. At the same time, the BMC could be sending commands to read DIMM temperature sensors. This can cause SMBUS collisions, which could either cause DIMM temperature reading to fail or UEFI DIMM enumeration to fail. The latter case would cause the system to boot up with reduced DIMM capacity, which could be mistaken as a failing DIMM scenario. After we had discovered the race condition issue, we disabled the BMC from reading the DIMM temperature sensors during S2_D2 ACPI state and set a fixed speed for the corresponding fans. This solution allows our UEFI to retrieve all the necessary DIMM subsystems information for enumeration, and our servers now boot up with the correct size of memory.

Thermal telemetry: In S0_G0 power state, when sensors are not reporting values back to the BMC, the BMC assumes that devices may be overheating and puts the fan controller into fail-safe mode where fan speeds are ramped up to maximum speed. However, in S5_G2 state, some thermal sensors like CPU temperature, NIC temperature, etc. are not powered and not available. Our solution is to set these thermal sensors as non-functional in their exported configuration when in S5_G2 state and during the transition from S5_G2 state to S2_D2 state. Setting the affected devices as non-functional in their configuration, instead of waiting for thermal sensor read commands to error out, prevents the controller from entering the fail-safe mode.

Moving forward

Aside from resolving issues, we have seen other benefits from implementing ACPI Power State on our BMC firmware. An example is in the area of our automated firmware regression testing. Various parts of our tests require rebooting/power cycling the servers over a hundred times, during which we monitor the ACPI power state changes of our servers as against using a boolean (running or not running, pingable or not pingable) to assert the status of our servers.

Also, it has given us the opportunity to learn more about the complex subsystems in a server system, and the various power modes of the different subsystems. This is an aspect that we are still actively learning about as we look to further optimize various aspects of the boot sequence of our servers.

In the course of time, implementing ACPI states is helping us achieve the following:

All components are enabled by end of boot sequence,
BIOS and BMC are able to retrieve component information,
And the BMC is aware when thermal sensors are in a non-functional state.

For better observability of the boot progress and “last state” of our systems, we have also started the process of adding the BootProgress object of the Redfish ComputerSystem Schema into our systems. This will give us an opportunity for pre-operating system (OS) boot observability and an easier debug starting point when the UEFI has issues (such as when the server isn’t coming on) during the server platform initialization.

With each passing day, Cloudflare’s OpenBMC team, which is made up of folks from different embedded backgrounds, learns about, experiments with, and deploys OpenBMC across our global fleet. This has been made possible by relying on the OpenBMC community’s contribution (as well as upstreaming some of our own contributions), and our interaction with our various vendors, thereby giving us the opportunity to make our systems more reliable, and giving us the ownership and responsibility of the firmware that powers the BMCs that manage our servers. If you are thinking of embracing open-source firmware in your BMC, we hope this blog post written by a team which started deploying OpenBMC less than 18 months ago has inspired you to give it a try.

For those who are interested in considering making the jump to open-source firmware, check it out here!

No, The Chinese Have Not Broken Modern Encryption Systems with a Quantum Computer

2024-10-22 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/no-the-chinese-have-not-broken-modern-encryption-systems-with-a-quantum-computer.html

The headline is pretty scary: “China’s Quantum Computer Scientists Crack Military-Grade Encryption .”

No, it’s not true.

This debunking saved me the trouble of writing one. It all seems to have come from this news article, which wasn’t bad but was taken widely out of proportion.

Cryptography is safe, and will be for a long time

EDITED TO ADD (11/3): Really good explainer from Dan Goodin.

K10+ vs. K10+ Pro: Is the Upgrade Worth It?

2024-10-22 BeardedTinker

Post Syndicated from BeardedTinker original https://www.youtube.com/watch?v=q4rtdBmToW4

Where Infor started

Challenges

SearchLatency Pre-Modernization

CPUUtilization Pre-Modernization

Infor’s journey to modernize search with OpenSearch Service

Operational review and enhancements

Storage optimization

Upgrading OpenSearch Service

Optimizing instance selection for performance

Results

Conclusion

About the Authors

Overview

Accessibility

Differences from Visual Studio Code IDE

AWS Toolkit for Visual Studio Code extensions

Larger package sizes

Using the new features

Viewing code

Environment variables

Creating test events

Invoke function

Live Tail Logs

Keyboard shortcuts

Command Palette

Configuration settings

Downloading function code and template

Using Amazon Q

Classification vs. Predictive vs. Generative AI Models: What’s the Diff?

AI classification models

AI prediction models

Predicting Hard Drive Failure Rates with AI

Generative AI models

Unlocking the power of AI

Дали наистина причините са в повечето права на жените и ЛГБТ хората?

Авторитаризъм, репресии и логически грешки

Картината

„Значи е урбанизацията!“

Background

Security objective 1: Implementing networks that are isolated from the internet

Security objective 2: Implement a data perimeter using VPC endpoint policies

Security objective 3: Enable private connectivity to AWS service API endpoints for on-premises environments

Security objective 4: Align with specific compliance requirements

Conclusion

What is a vector database?

Why do vectors require special database support?

Query processing architecture

Searching vectors, and indexing them to speed things up

Indexing vectors with IVF

Compressing vectors with PQ

Approximate nearest neighbor search and result accuracy refining

Eventual consistency and snapshot versioning

Coordinating distributed writes: it’s just another block in the WAL

The WAL

WAL Executor

Batching updates

Retraining the index

Indexing and filtering metadata

Indexing metadata with Chunked Sorted List Indexes

Joining the metadata and vector indexes

Query performance

Query latency optimization

Query throughput

Increased index size

Try it now!

Introduction

Server subsystems

Baseboard Management Controller (BMC) as a manager of the host

Implementing host ACPI state on OpenBMC

Fixing the issues

Moving forward

The collective thoughts of the interwebz