Accelerate large-scale AI applications with the new Amazon EC2 P6-B300 instances

2025-11-19 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/accelerate-large-scale-ai-applications-with-the-new-amazon-ec2-p6-b300-instances/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P6-B300 instances, our next-generation GPU platform accelerated by NVIDIA Blackwell Ultra GPUs. These instances deliver 2 times more networking bandwidth, and 1.5 times more GPU memory compared to previous generation instances, creating a balanced platform for large-scale AI applications.

With these improvements, P6-B300 instances are ideal for training and serving large-scale AI models, particularly those employing sophisticated techniques such as Mixture of Experts (MoE) and multimodal processing. For organizations working with trillion-parameter models and requiring distributed training across thousands of GPUs, these instances provide the perfect balance of compute, memory, and networking capabilities.

Improvements made compared to predecessors
The P6-B300 instances deliver 6.4Tbps Elastic Fabric Adapter (EFA) networking bandwidth, supporting efficient communication across large GPU clusters. These instances feature 2.1TB of GPU memory, allowing large models to reside within a single NVLink domain, which significantly reduces model sharding and communication overhead. When combined with EFA networking and the advanced virtualization and security capabilities of AWS Nitro System, these instances provide unprecedented speed, scale, and security for AI workloads.

The specs for the EC2 P6-B300 instances are as follows.

Instance size	VCPUs	System memory	GPUs	GPU memory	GPU-GPU interconnect	EFA network bandwidth	ENA bandwidth	EBS bandwidth	Local storage
P6-B300.48xlarge	192	4TB	8x B300 GPU	2144GB HBM3e	1800 GB/s	6.4 Tbps	300 Gbps	100 Gbps	8x 3.84TB

Good to know
In terms of persistent storage, AI workloads primarily use a combination of high performance persistent storage options such as Amazon FSx for Lustre, Amazon S3 Express One Zone, and Amazon Elastic Block Store (Amazon EBS), depending on price performance considerations. For illustration, the dedicated 300Gbps Elastic Network Adapter (ENA) networking on P6-B300 enables high-throughput hot storage access with S3 Express One Zone, supporting large-scale training workloads. If you’re using FSx for Lustre, you can now use EFA with GPUDirect Storage (GDS) to achieve up to 1.2Tbps of throughput to the Lustre file system on the P6-B300 instances to quickly load your models.

Available now
The P6-B300 instances are now available through Amazon EC2 Capacity Blocks for ML and Savings Planin the US West (Oregon) AWS Region.
For on-demand reservation of P6-B300 instances, please reach out to your account manager. As usual with Amazon EC2, you pay only for what you use. For more information, refer to Amazon EC2 Pricing. Check out the full collection of accelerated computing instances to help you start migrating your applications.

To learn more, visit our Amazon EC2 P6-B300 instances page. Send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Veliswa

Python 3.14 runtime now available in AWS Lambda

2025-11-18 Leandro Cavalcante Damascena

Post Syndicated from Leandro Cavalcante Damascena original https://aws.amazon.com/blogs/compute/python-3-14-runtime-now-available-in-aws-lambda/

AWS Lambda now supports Python 3.14 as both a managed runtime and container base image. Python is a popular language for building serverless applications. Developers can now take advantage of new features and enhancements when creating serverless applications on Lambda.

You can develop Lambda functions in Python 3.14 using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDK for Python (Boto3), AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK), and other infrastructure as code tools.

The Python 3.14 runtime supports Powertools for AWS Lambda (Python), a developer toolkit that helps you to implement serverless best practices. Powertools includes observability, batch processing, AWS Systems Manager Parameter Store integration, idempotency, feature flags, Amazon CloudWatch metrics, structured logging, and more.

Lambda@Edge allows you to use Python 3.14 to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights notable Python language updates, Python Lambda runtime features and support, and how you can use the new Python 3.14 runtime in your serverless applications.

New Python features

Python 3.14 contains the following notable updates.

Template strings literal

Template strings introduce a new mechanism for custom string processing using the t prefix instead of f for f-strings. Unlike f-strings that return a simple string, t-strings return an object representing both static and interpolated parts.

Evaluation of type annotations

With the implementation of PEP 649, Python 3.14 defers type annotation evaluation until required. This reduces import time overhead and resolves forward reference issues.

Improved Error Messages

The interpreter now provides helpful suggestions when it detects typos in Python keywords. These include incorrect control flow structures, misused conditional expressions, string syntax errors, incompatible type usage in dicts/sets, and context manager protocol mismatches.

whille :

Traceback (most recent call last):
  File "<stdin>", line 1
    whille :
    ^^^^^^
SyntaxError: invalid syntax. Did you mean 'while'?

Standard library

The standard library includes a new compression.zstd module that provides native support for zstandard compression, offering better compression ratios and faster decompression compared to existing algorithms.

Python 3.14 also includes improved error messages and enhanced asyncio introspection capabilities.

Lambda runtime changes

The Lambda Python runtime contains the following changes.

Python 3.14 features that are not available

Python 3.14 includes some features that are not enabled for the Lambda managed runtime or base images. These features must be enabled when the Python runtime is compiled and cannot be enabled via an execution-time flag. The just-in-time (JIT) compiler is not available in the Lambda runtime because it’s still in an experimental phase. Free-threaded mode, running Python without the global interpreter lock, is supported in Python 3.14, but it is not enabled in the Lambda runtime due to potential performance impact. To use these features in Lambda, you can deploy your own Python runtime build with these features enabled, using a container image or custom runtime.

Amazon Linux 2023

As with the Python 3.12 and Python 3.13 runtimes, the Python 3.14 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Python 3.11 and earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.14 base image from Python 3.11 or earlier base images.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Using Python 3.14 in Lambda

You can use Python 3.14 for your Lambda functions in the AWS Management Console, an AWS Lambda container image, or the AWS Cloud Development Kit (AWS CDK).

AWS Management Console

To use the Python 3.14 runtime to develop your Lambda functions, specify a runtime parameter value of Python 3.14 when creating or updating a function. On the Create Function page of the AWS Lambda console, Python 3.14 is available in the Runtime dropdown menu.

To update an existing Lambda function to Python 3.14, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown menu.

Upgrading a function to Python 3.14

To upgrade a function to Python 3.14, check your code and dependencies for compatibility with Python 3.14, run tests, and update as necessary. Consider using generative AI coding assistants like Amazon Q Developer, Amazon Q Developer for CLI, or Kiro to help with upgrades.

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile:

FROM public.ecr.aws/lambda/python:3.14
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.14 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.14

AWS SAM supports generating this template with Python 3.14 for new serverless applications using the sam init command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit

In the AWS CDK, set the runtime attribute to lambda.Runtime.PYTHON_3_14 to use this version.

In Python CDK:

from constructs import Construct
from aws_cdk import ( App, Stack, aws_lambda as _lambda )
class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        base_lambda = _lambda.Function(self, 'python314LambdaFunction',
                                       handler='lambda_handler.handler',
                                    runtime=_lambda.Runtime.PYTHON_3_14,
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';
export class SampleLambdaStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    // The code that defines your stack goes here
    // The python3.14 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python314LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_14,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Serverless Land Patterns AWS Top Picks for Python, now use Python 3.14.

Performance considerations

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing instead of relying on generic test benchmarks.

Conclusion

Lambda now supports Python 3.14 as a managed language runtime to help developers build more efficient, powerful, and scalable serverless applications. Python 3.14 language additions include data model improvements, typing changes, and updates to the standard library. The Lambda managed runtime does not include the option to disable the global interpreter lock (GIL) or use the experimental JIT compiler.

You can build and deploy functions using Python 3.14 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Python 3.14 container base image if you prefer to build and deploy your functions using container images.

Try the Python 3.14 runtime in Lambda today and experience the benefits of this updated language version.

To find more Python examples, use the Serverless Patterns Collection. For more serverless learning resources, visit Serverless Land.

Blender 5.0 released

2025-11-18 jzb

Post Syndicated from jzb original https://lwn.net/Articles/1046962/

Version
5.0 of the Blender animation system has been released. Notable
improvements include improved color management, HDR capabilities, and
a new storyboarding template. See the release
notes for a lengthy list of new features and changes, and the bugfixes
page for the 588 commits that fixed bugs in Blender 4.5 or older.

HPE Shows off AMD EPYC Venice and SP7 Supercomputing Node at SC25

2025-11-18 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/hpe-shows-off-amd-epyc-venice-and-sp7-supercomputing-node-at-sc25/

HPE showed off its next-generation AMD EPYC “Venice” socket SP7 supercomputing platform at SC25 along with its Slingshot 400 interconnect

The post HPE Shows off AMD EPYC Venice and SP7 Supercomputing Node at SC25 appeared first on ServeTheHome.

Shielded vs Unshielded Ethernet – Stop Wasting Money on the Wrong Cable

2025-11-18 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/shorts/2Cr9DbzTlNI

Public Media #lastweektonight

2025-11-18 LastWeekTonight

Post Syndicated from LastWeekTonight original https://www.youtube.com/shorts/7hkuCkya9IQ

How to automate Session Manager preferences across your organization

2025-11-18 Nima Fotouhi

Post Syndicated from Nima Fotouhi original https://aws.amazon.com/blogs/security/how-to-automate-session-manager-preferences-across-your-organization/

AWS Systems Manager Session Manager is a fully managed service that provides secure, interactive, one-click access to your Amazon Elastic Compute Cloud (Amazon EC2) instances, edge devices, and virtual machines (VMs) through a browser-based shell or AWS Command Line Interface (AWS CLI), without requiring open inbound ports, bastion hosts, or SSH keys. Session Manager helps you maintain security compliance and controlled access while providing users with access to managed nodes. When starting a session, you must specify a preferences document (known as the Session Manager preferences document) to set the session parameters.

While providing users with access to managed nodes, managing these preferences consistently across multiple AWS Regions and accounts in a large organization can be challenging. Organizations often need to maintain standardized security settings, logging configurations, and session controls across their entire AWS footprint. Manual configuration of these preferences in each Region and account is not only time-consuming but also prone to human error and can lead to security gaps or compliance violations. Additionally, tracking and maintaining these configurations becomes increasingly complex as the organization scales.

You can use Session Manager to control various session options including data encryption for session data in transit and session logs at rest, session duration, and logging. For example, you can specify whether to store session log data in an Amazon Simple Storage Service (Amazon S3) bucket or Amazon CloudWatch Logs log group. In this post, I demonstrate how to manage Session Manager preferences across your organization using AWS CloudFormation StackSets. You can use CloudFormation StackSets to manage resources and configurations, such as Session Manager preferences, across different AWS accounts and Regions using standardized templates to maintain consistent security and compliance standards across your entire AWS infrastructure.

Prerequisites

You need to meet the following prerequisites to deploy the solution in this post:

Basic understanding of CloudFormation
Trusted access enabled between CloudFormation StackSets and AWS Organizations
Access to an AWS management account or StackSet delegated admin account
Appropriate AWS Identity and Access Management (IAM) permissions to create and manage StackSets

The Session Manager environment has some additional prerequisites:

For EC2 instances with internet access, allow HTTPS (port 443) outbound traffic to:
- ec2messages.<region>.amazonaws.com
- ssm.<region>.amazonaws.com
- ssmmessages.<region>.amazonaws.com
Note: <region> represents the actual Region where you are deploying your instances.
Additional endpoints required for specific features:
- For CloudWatch Logs integration: logs.<region>.amazonaws.com
- For Amazon S3 log storage: s3.<region>.amazonaws.com
- For session data encryption: kms.<region>.amazonaws.com
Note: For EC2 instances without internet access, you must configure virtual private cloud (VPC) endpoints to maintain connectivity with Systems Manager and related services.
SSM Agent requirements:
- Minimum version 2.3.68.0 for basic session connectivity
- Version 3.0.222.0 or later for port forwarding and SSH sessions
Note: Many AWS-provided and trusted third-party Amazon Machine Images (AMIs) come with the SSM Agent pre-installed. For more information, see Find AMIs with the SSM Agent preinstalled.

For a complete list of requirements, see Setting up Session Manager.

Solution overview

This solution, shown in Figure 1, automatically configures the SSM-SessionManagerRunShell document with customizable preferences that govern how Session Manager behaves across your AWS accounts. It creates resources for logging, encryption, and session controls, and updates the SSM-SessionManagerRunShell document with these preferences. The document is updated by an AWS Lambda function that helps make sure that the preferences are correctly applied. It transforms the default Session Manager preferences document to meet your enterprise compliance requirements. Changes are deployed using CloudFormation template provided in the GitHub repository. The solution supports multiple logging destinations, encryption options, and session controls to meet various security and compliance requirements.

Figure 1: Solution overview

Walkthrough

To deploy the solution, complete the following steps.

Step 1: Download or clone the repository

The first step is to download or clone the GitHub repository.

To download the repository:

Go to the main page of the repository on GitHub.
Choose Code and then choose Download ZIP.

To clone the repository:

Make sure that you have Git installed.
Run the following command in your terminal:
git clone https://github.com/aws-samples/<repo-link>

Step 2: Create the CloudFormation StackSet

In this step, you deploy the solution’s resources by creating a CloudFormation StackSet using the provided CloudFormation template. Sign in to your management account or StackSet delegated admin account. To create the stack, follow the steps in Get started with StackSets using a sample template. Create the StackSet in each of the accounts and Regions where you plan to implement the solution. Note that you need to provide values for the parameters defined in the template to deploy the stack. The following table lists the parameters that you need to provide.

Parameter	Description
`S3Logging`	Enables storing session logs to an S3 bucket.
`S3BucketName`	Name of the S3 bucket for session logs. The bucket must exist or the deployment will fail.
`S3KeyPrefix`	Key prefix for session logs, will be appended by account ID and Region
`S3EncryptionEnabled`	If set to `true`, the S3 bucket you specified in the `s3BucketName` input must be encrypted.
`CreateCWLogGroup`	Creates the CloudWatch log group. If set to `true`, a CloudWatch log group will be created; if not, the log group name passed is used.
`CWLogGroupName`	The name of the CloudWatch log group you want to send session logs to.
`CWEncryptionEnabled`	If set to `true`, the CloudWatch log group you specified in the `cwLogGroupName` input must be encrypted.
`CWStreamingEnabled`	If set to `true`, a continual stream of session data logs is sent to the log group.
`SessionDataEncryption`	If set to `true`, session data is encrypted with a key created by the stack.
`RunAsEnabled`	If set to `true`, sessions are run using another user than `ssm-user`. The Run As feature is only supported for connecting to Linux and macOS managed nodes.
`RunAsDefaultUser`	The name of the user account to start sessions with on Linux and macOS managed nodes when the `runAsEnabled` input is set to `true`.
`IdleSessionTimeout`	The amount of time of inactivity you want to allow before a session ends. This input is measured in minutes.
`MaxSessionDuration`	The maximum amount of time you want to allow before a session ends. This input is measured in minutes.
`WinShellProfile`	The shell preferences, environment variables, working directories, and commands you specify for sessions on Windows Server managed nodes.
`LinuxShellProfile`	The shell preferences, environment variables, working directories, and commands you specify for sessions on Linux and macOS managed nodes.

Step 3: Update your EC2 instance profiles with proper permissions

Depending on the parameter values you pass when deploying the template, you need to update your EC2 instance profiles with proper permissions. For example, if you have enabled session data and session log encryption, you need to add the following policy to your instance profiles.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "logs:DescribeLogGroups"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "logs:DescribeLogStreams"
            ],
            "Resource": "<arn:aws:logs:*:123456789012:log-group:ssm-sessionmanager-logs>",
            "Effect": "Allow"
        },
        {
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:123456789012:log-group:ssm-sessionmanager-logs:log-stream:*",
            "Effect": "Allow"
        },
        {
            "Condition": {
                "Null": {
                    "kms:ResourceAliases": "false"
                },
                "ForAllValues:StringLike": {
                    "kms:ResourceAliases": [
                        "alias/session-manager/data"
                    ]
                }
            },
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": "arn:aws:kms:us-east-1:123456789012:key/*",
            "Effect": "Allow"
        }
    ]
}

Note: If you enable S3 logging, you need to add the required permissions for that as well. See Configure a central S3 bucket for Session Manager logging article on AWS re:Post for more information about how to properly configure your S3 bucket and EC2 instance profile for centralized logging. Same-account logging follows a similar pattern.

Step 4: Verify the solution implementation

You can verify that the Session Manager preferences are correctly configured across your environment. Here’s a systematic approach to validation:

Verify preference configuration

Through the AWS Management Console, navigate to AWS Systems Manager Session Manager, choose Preferences and review the configured Session Manager preferences. Alternatively, verify the configuration through AWS CLI using:

aws ssm get-document --name "SSM-SessionManagerRunShell" --document-version \$LATEST

Validate session functionality

Start a new session following the AWS Systems Manager User Guide and perform the following validations:

Verify the encryption configuration by starting a new session. If session data encryption is enabled, you should see the message This session is encrypted using AWS KMS when the session begins.
For CloudWatch logging verification, navigate to the CloudWatch console and access the Log groups section. Confirm that your specified log group exists and KMS encryption is enabled if you configured it during deployment. Execute some commands in your session and observe the real-time log streaming to your configured log group.
To verify S3 logging, establish a session and execute several commands. Terminate the session and check your configured S3 bucket for the session logs. Remember that S3 logs are only generated after the session is terminated.
If you enabled the RunAsEnabled option, verify the configuration by executing the whoami command in your session. The output should match your configured RunAs user.

Resources

The following is a list of resources created by this solution:

AWS::Lambda::Function (UpdateSessionManagerFunction)
This resource creates a Lambda function that:

Updates the SSM-SessionManagerRunShell document with the specified preferences
Handles CloudFormation create, update, and delete events
Performs deep comparison of document contents to avoid unnecessary updates
Includes error handling and retry logic

AWS::IAM::Role (LambdaExecutionRole)
This resource creates an IAM role that allows the Lambda function to:

Execute with basic Lambda permissions
Access and modify the SSM-SessionManagerRunShell document
Access SSM parameters storing session data encryption key ID

AWS::KMS::Key (SessionDataKMSKey)
This conditional resource creates a KMS key for encrypting session data when SessionDataEncryption parameter is set to enabled. The key has a policy allowing key management with IAM.

AWS::KMS::Alias (SessionDataKeyAlias)
This conditional resource creates a friendly alias (alias/session-manager/data) for the session data encryption key. This value cannot be changed.

AWS::SSM::Parameter (SessionKeyID)
This conditional resource creates an Systems Manager parameter to store the KMS key ID for session data encryption, making it accessible to other components.

Note: The session data KMS key ID is stored in a Systems Manager parameter to decouple components and help prevent circular dependency and failures to due race conditions.

AWS::KMS::Key (SessionLogsKMSKey)
This conditional resource creates a KMS key for encrypting CloudWatch logs when CWEncryptionEnabled parameter is set to enabled. The key has a policy allowing CloudWatch Logs service to use it

Note: SessionLogsKMSKey is used to encrypt logs at-rest and is not used by the SSM Agent, so your instance profile does not need to have permission to this key. Logs are encrypted in-transit and will be encrypted by CloudWatch service after they are received.

AWS::KMS::Alias (SessionLogsKeyAlias)
This conditional resource creates a friendly alias (alias/session-manager/logs) for the CloudWatch Logs encryption key.

AWS::Logs::LogGroup (SessionManagerLogGroup)
This conditional resource creates a CloudWatch Logs group for session logs when the CreateCWLogGroup paremeter is set to enabled. The log group:

Uses the specified name (controlled by the CWLogGroupName parameter, and defaults to ssm-sessionmanager-logs)
Sets a 90-day retention period
Uses KMS encryption if enabled

Custom::UpdateSessionManager (UpdateSessionManagerCustomResource)
This custom resource invokes the Lambda function to update the SSM-SessionManagerRunShell document with the specified preferences.

Parameter groups

The following template parameters are available for customizing Session Manager behavior:

Parameter group	Parameters	Description
S3 logging	`S3Logging, S3BucketName, S3KeyPrefix, S3EncryptionEnabled`	Controls logging to Amazon S3
CloudWatch logging	`CreateCWLogGroup, CWLogGroupName, CWEncryptionEnabled, CWStreamingEnabled`	Controls logging to CloudWatch Logs
Encryption	`SessionDataEncryption`	Controls encryption of session data
Session controls	`RunAsEnabled, RunAsDefaultUser, IdleSessionTimeout, MaxSessionDuration`	Controls session behavior
Shell profiles	`WinShellProfile, LinuxShellProfile`	Controls shell environment

Conclusion

In this post, we explored how to implement and manage Session Manager preferences across your organization using CloudFormation StackSets. This solution enables centralized management of Session Manager configurations across multiple accounts and Regions from a single account, significantly simplifying the administration of remote access to your compute resources. Through automated deployment of security controls including session encryption, logging, and access restrictions, the solution helps facilitate consistent compliance with organizational security requirements while reducing manual configuration efforts and the risk of human error. As your organization grows, this solution scales seamlessly to accommodate new accounts and Regions while maintaining uniform security standards across your infrastructure.

Remember to regularly review and update your Session Manager preferences to align with evolving security requirements and organizational needs. For more information about AWS Systems Manager Session Manager, visit the official AWS documentation.

If you have feedback about this post, submit comments in the Comments section below.

Build priority-based message processing with Amazon MQ and AWS App Runner

2025-11-18 Aritra Nag

Post Syndicated from Aritra Nag original https://aws.amazon.com/blogs/architecture/build-priority-based-message-processing-with-amazon-mq-and-aws-app-runner/

Organizations need message processing systems that can prioritize critical business operations while handling routine tasks efficiently. When handling time-sensitive tasks like rush orders from key customers, critical system alerts, or multi-step business processes, you need to prioritize urgent messages while making sure other routine requests are processed reliably.

In this post, we show you how to build a priority-based message processing system using Amazon MQ for priority queuing, Amazon DynamoDB for data persistence, and AWS App Runner for serverless compute. We demonstrate how to implement application-level delays that high-priority messages can bypass, create real-time UIs with WebSocket connections, and configure dual-layer retry mechanisms for maximum reliability.

This solution addresses three critical challenges in modern data processing systems:

Implementing configurable delay processing at the application level
Supporting priority-based message routing that respects business requirements
Providing real-time feedback to users through WebSocket connections

The use of AWS managed services reduces operational complexity, so teams can focus on business logic rather than infrastructure management. Message handling with priority-based processing makes sure operations receive attention while routine tasks are processed in the background. Users will experience status updates that provide visibility into their requests, while retry mechanisms provide reliability during failures. The infrastructure as code (IaC) approach supports deployments across different environments, from development through production.

Solution overview

The solution consists of several AWS managed services to create a serverless, priority-based message processing system with real-time user feedback. The architecture implements intelligent routing based on three message priority levels, to make sure critical messages receive immediate processing:

High-priority path – Messages bypass delays and queue immediately with JMS priority 9
Standard-priority path – Messages undergo configured delays before queuing with JMS priority 4
Low-priority path – Messages process after all higher priority messages with JMS priority 0

The following diagram illustrates this architecture.

The solution uses the following AWS managed services to deliver a scalable, serverless architecture:

AWS App Runner is a fully managed container application service that automatically builds, deploys, and scales containerized applications. It provides automatic scaling based on traffic, built-in load balancing and HTTPS, seamless integration with container registries, and zero infrastructure management overhead.
Amazon MQ is a managed message broker service for Apache ActiveMQ that offers priority-based message queuing, automatic failover for high availability, message persistence and durability, and JMS protocol support for enterprise applications.
Amazon DynamoDB is a fully managed NoSQL database service providing single-digit millisecond performance at any scale, automatic scaling with on-demand pricing, built-in security and backup capabilities, and global tables for multi-Region deployments.

The system uses JMS priority levels with High=9, Medium=4, and Low=0 for automatic ordering, combined with conditional delay processing based on priority classification. Amazon MQ provides reliable message delivery and persistence with dead-letter queue (DLQ) configuration for failed message handling.

Asynchronous delay processing uses CompletableFuture implementation for non-blocking delays, thread pool management for concurrent processing, graceful error handling with retry mechanisms, and configurable delay periods per message type to optimize resource utilization. For real-time status updates, the solution provides WebSocket connections for bidirectional communication, Amazon DynamoDB Streams for change data capture (CDC), comprehensive status tracking throughout the processing lifecycle, and a React frontend integration for live updates, so users have complete visibility into their message processing status.

The standard priority messaging flow (shown in the following diagram) handles messages with configurable delays using JMS asynchronous processing capabilities. Messages wait for their specified delay period before entering the Amazon MQ queue, where they’re processed.

The high-priority messaging flow (shown in the following diagram) provides an express lane for critical messages. These messages skip the delay mechanism entirely and proceed directly to the queue, providing immediate processing for time-sensitive operations.

To make it even more straightforward to get started, we’ve prepared an example application that you can use to observe the Amazon MQ behavior with varying message volumes. You can find the source code repository, IaC implementation, and instructions to run the sample on GitHub.

In the following sections, we walk you through deploying the complete processing system.

Prerequisites

Make sure you have the following tools, permissions, and knowledge to successfully deploy the priority-based message processing system. You must have an active AWS account with the following configurations:

The AWS Command Line Interface (AWS CLI) version 2.15.0 or later
AWS Identity and Access Management (IAM) roles with the minimum permissions:

# JSON
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
"apprunner:CreateService",
"apprunner:UpdateService",
"apprunner:DeleteService"
      ],
      "Resource": "arn:aws:apprunner:*:*:service/reactive-demo-*"
    },
    {
      "Effect": "Allow",
      "Action": [
"mq:SendMessage",
"mq:ReceiveMessage",
"mq:DeleteMessage"
      ],
      "Resource": "arn:aws:mq:*:*:broker/reactive-demo-broker/*"
    },
    {
      "Effect": "Allow",
      "Action": [
"dynamodb:PutItem",
"dynamodb:GetItem",
"dynamodb:UpdateItem",
"dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/reactive-items*"
    }
  ]
}

Install and configure the following development tools on your local machine:

Java 17 or later
Spring Boot 3.2.0 or later
Node.js 18.19.0 or later
Docker 24.0.7 or later
Maven 3.9.6 or later
AWS CDK 2.124.0 or later

To successfully implement this solution, you should have basic familiarity with the following:

Spring Boot applications
Message queue concepts
WebSocket protocols
React development

Configure the infrastructure stack

This step involves creating the core AWS services using the AWS Cloud Development Kit (AWS CDK). This modular approach enables independent stack management and environment-specific configurations.

Create a new AWS CDK project:

# Bash
mkdir priority-processing && cd priority-processing
cdk init app --language python
pip install aws-cdk-lib constructs

Create the infrastructure stack:

# Python
from aws_cdk import (
    Stack,
    aws_dynamodb as dynamodb,
    aws_amazonmq as mq,
    aws_kms as kms,
    Duration,
    RemovalPolicy,
    CfnOutput
)
from constructs import Construct

class MessageProcessingStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)

# Create KMS key for encryption
self.kms_key = kms.Key(
    self, "ProcessingKey",
    description="Key for message processing encryption",
    enable_key_rotation=True
)

# DynamoDB table with comprehensive configuration
self.items_table = dynamodb.Table(
    self, "ItemsTable",
    table_name="reactive-items",
    partition_key=dynamodb.Attribute(
name="id",
type=dynamodb.AttributeType.STRING
    ),
    stream=dynamodb.StreamViewType.NEW_AND_OLD_IMAGES,
    billing_mode=dynamodb.BillingMode.ON_DEMAND,
    encryption=dynamodb.TableEncryption.CUSTOMER_MANAGED,
    encryption_key=self.kms_key,
    point_in_time_recovery=True,
    removal_policy=RemovalPolicy.DESTROY
)

# Add Global Secondary Index for status queries
self.items_table.add_global_secondary_index(
    index_name="StatusIndex",
    partition_key=dynamodb.Attribute(
name="status",
type=dynamodb.AttributeType.STRING
    ),
    sort_key=dynamodb.Attribute(
name="createdAt",
type=dynamodb.AttributeType.STRING
    )
)

# Amazon MQ broker configuration
self.mq_broker = mq.CfnBroker(
    self, "MessageBroker",
    broker_name="reactive-demo-broker",
    engine_type="ACTIVEMQ",
    engine_version="5.18",
    host_instance_type="mq.t3.micro",
    deployment_mode="SINGLE_INSTANCE",
    publicly_accessible=False,
    logs=mq.CfnBroker.LogListProperty(
audit=True,
general=True
    ),
    encryption_options=mq.CfnBroker.EncryptionOptionsProperty(
use_aws_owned_key=False,
kms_key_id=self.kms_key.key_id
    ),
    users=[mq.CfnBroker.UserProperty(
username="admin",
password="SecurePassword123!",
console_access=True
    )]
)

# Output values for application configuration
CfnOutput(self, "TableName", 
    value=self.items_table.table_name,
    description="DynamoDB table name")
CfnOutput(self, "MQBrokerEndpoint",
    value=self.mq_broker.attr_amqp_endpoints[0],
    description="Amazon MQ broker endpoint")

Run the following commands to deploy the stack:

# Bash
cdk bootstrap
cdk deploy MessageProcessingStack

You can verify the infrastructure on the AWS Management Console.

Configure the message processing application

In this step, we create the Spring Boot application with priority-based message processing capabilities. First, we configure the application.properties file to incorporate environment variables, including AWS credentials, AWS Regions, and other configuration parameters such as log levels into the application and business logic implementation. Next, we implement the message service using a JMS template with comprehensive error handling, followed by enhancing the JMS configuration with connection pooling for improved performance.

The following code illustrates an example message service implementation:

// Example message service implementation
@Service
public class MessageService {
    @Autowired
    private JmsTemplate jmsTemplate;
    
    public void sendPriorityMessage(Message message) {
jmsTemplate.send(session -> {
    Message jmsMessage = session.createTextMessage(message.getContent());
    jmsMessage.setJMSPriority(message.getPriority());
    return jmsMessage;
});
    }
}

For proper timestamp update implementation, we integrate the DynamoDB SDK service with caching capabilities. Finally, after implementing the REST controller for the API with asynchronous processing support, we can deploy the message processing application. This implementation includes Java code application-level delay processing for demonstration purposes. Although this approach effectively showcases the priority-based message routing capabilities and real-time WebSocket updates in our demo environment, AWS recommends using Amazon MQ delay processing features for production workloads. For production implementations, use Amazon MQ delay and scheduling capabilities instead of application-level delays through features like Amazon MQ delay queues, ActiveMQ scheduling features, and appropriate message Time-to-Live (TTL) configurations.

The following code is an example snippet showcasing the Amazon MQ feature:

// Create connection factory with Amazon MQ endpoint
ActiveMQConnectionFactory factory = new ActiveMQConnectionFactory(brokerUrl);
factory.setUserName("admin");
factory.setPassword("your-password");
try (Connection connection = factory.createConnection();
     Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE)) {
    
    // Create destination and producer
    Destination destination = session.createQueue(queueName);
    MessageProducer producer = session.createProducer(destination);
    
    // Create message
    TextMessage message = session.createTextMessage(messageContent);
    
    // Set native delay using ActiveMQ scheduled delivery
    message.setLongProperty(ScheduledMessage.AMQ_SCHEDULED_DELAY, delayMillis);
    
    // Optionally set priority for delayed message
    message.setJMSPriority(4);
    
    // Send the message - it will be delivered after the specified delay
    producer.send(message);
}

Build and deploy the Spring Boot application to App Runner

In this step, we push the application to Amazon Elastic Container Registry (Amazon ECR) to run it in App Runner:

Build and push the Docker image to Amazon ECR:

# Bash

# Build the Docker image
docker build -t reactive-demo .

# Create ECR repository
aws ecr create-repository --repository-name reactive-demo --region us-east-1

# Get login token and login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $ECR_URI

# Tag and push image
ECR_URI=$(aws ecr describe-repositories --repository-names reactive-demo --query 'repositories[0].repositoryUri' --output text)
docker tag reactive-demo:latest $ECR_URI:latest
docker push $ECR_URI:latest

Create the App Runner service with environment variables for the DynamoDB table and Amazon MQ broker endpoint:

# Python

from aws_cdk import (
    aws_apprunner as apprunner,
    aws_iam as iam
)

class AppRunnerStack(Stack):
    def __init__(self, scope: Construct, id: str, 
 table_name: str, mq_endpoint: str, **kwargs):
super().__init__(scope, id, **kwargs)

# Create IAM role for App Runner
app_runner_role = iam.Role(
    self, "AppRunnerRole",
    assumed_by=iam.ServicePrincipal("tasks.apprunner.amazonaws.com"),
    managed_policies=[
iam.ManagedPolicy.from_aws_managed_policy_name(
    "AmazonDynamoDBFullAccess"
),
iam.ManagedPolicy.from_aws_managed_policy_name(
    "AmazonMQFullAccess"
)
    ]
)

# Create App Runner service
self.service = apprunner.CfnService(
    self, "ReactiveProcessingService",
    service_name="reactive-processing-service",
    source_configuration=apprunner.CfnService.SourceConfigurationProperty(
authentication_configuration=apprunner.CfnService.AuthenticationConfigurationProperty(
    access_role_arn=app_runner_role.role_arn
),
image_repository=apprunner.CfnService.ImageRepositoryProperty(
    image_identifier=f"{ECR_URI}:latest",
    image_configuration=apprunner.CfnService.ImageConfigurationProperty(
port="8080",
runtime_environment_variables=[
    {"name": "DYNAMODB_TABLE_NAME", "value": table_name},
    {"name": "MQ_BROKER_URL", "value": mq_endpoint}
]
    ),
    image_repository_type="ECR"
)
    ),
    health_check_configuration=apprunner.CfnService.HealthCheckConfigurationProperty(
path="/actuator/health",
protocol="HTTP",
interval=10,
timeout=5,
healthy_threshold=1,
unhealthy_threshold=5
    ),
    instance_configuration=apprunner.CfnService.InstanceConfigurationProperty(
cpu="0.5 vCPU",
memory="1 GB"
    )
)

Set up real-time updates

For this step, we implement WebSocket support for real-time status updates using AWS Lambda to process DynamoDB streams and send updates to connected clients using Amazon API Gateway WebSocket connections. You can find the code snippet for this in this link

Deploy the React application to Amazon S3 and Amazon CloudFront

In this step, we create a frontend application to enable the WebSocket connection for seeing the messaging getting updated in the DynamoDB and API Gateway WebSocket connections.

Similar to the above section, here is the AWS cdk code for building the frontend for proceeding towards the validation of the solution

Validate the solution

This section provides comprehensive testing procedures to validate the priority-based message processing system.

Automated testing script

After you have completed the preceding steps, you can initiate a comprehensive testing script to validate priority processing and delay behavior:

# Bash
#!/bin/bash
curl -X POST "$API_URL/api/items" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "High Priority Task",
    "priority": "High",
    "delay": 10
  }'

Validation through the web interface

The following screenshot of the UI illustrates how the queueing mechanism can work with the real-time updates using WebSockets.

The web interface provides validation of the priority-based message processing system. Access the Amazon CloudFront URL to view the following information:

Real-time message processing with live status updates
Queue statistics showing message distribution by priority
Processing timeline demonstrating priority bypass behavior
WebSocket connection status indicating real-time connectivity

Amazon CloudWatch dashboards and alarms

AWS recommends creating Amazon CloudWatch dashboards to track your priority-based message processing system’s performance across multiple dimensions. Monitor message processing by priority levels to make sure high-priority messages are processed first and identify any bottlenecks in your priority routing logic. The following screenshot shows an example dashboard.

You can track queue depth and processing times to understand system load and latency patterns, helping you optimize resource allocation and identify when scaling is needed. Observe DynamoDB performance metrics including read/write capacity consumption, throttling events, and latency to make sure your database layer maintains optimal performance under varying loads.

Additionally, implement application-specific custom metrics such as message processing success rates, retry counts, and business-specific KPIs to gain deeper insights into your application’s behavior and make data-driven decisions for continuous improvement.

Security considerations

AWS recommends implementing comprehensive security measures to safeguard your message processing system. Start by implementing least privilege IAM policies that grant only the minimum permissions required for each component to function, making sure services like App Runner can only access the specific DynamoDB tables and Amazon MQ queues they need. Configure your network architecture using a virtual private cloud (VPC) with private subnets for Amazon MQ, isolating your message broker from direct internet access while maintaining connectivity through NAT gateways for necessary outbound connections.

Enable encryption at rest using AWS Key Management Service (AWS KMS) for DynamoDB tables and Amazon MQ data and enforce encryption in transit by configuring SSL/TLS connections for all service communications, particularly for ActiveMQ broker connections. Finally, configure security groups with minimal access rules that explicitly define allowed traffic between components, restricting inbound connections to only the ports and protocols required for your application to function, such as port 61617 for ActiveMQ SSL connections from App Runner instances.

Cost considerations

The following table contains cost estimates based on the US East (N. Virginia) Region. Actual costs might vary based on your Region, usage patterns, and pricing changes.

Service	Small (1,000 msg/day)	Medium (10,000 msg/day)	Large (100,000 msg/day)
Amazon DynamoDB	$5–10	$25–50	$200–400
Amazon MQ	$15 (t3.micro)	$30 (m5.large)	$120 (m5.xlarge)
AWS App Runner	$20–40	$50–150	$400–800
Amazon API Gateway WebSocket	$3–5	$10–25	$50–100
Amazon CloudWatch Logs	$5–10	$10–20	$30–50
Data Transfer	$5	$10-20	$50-100
Total Estimated Cost	$53–95	$135–295	$850–1,570

Troubleshooting

The following are common issues and their solutions when implementing the priority-based message processing system:

Messages not processing in priority order:
- Verify JMS priority is configured correctly: message.setJMSPriority(priority)
- Check ActiveMQ broker configuration for priority queue support
- Confirm CLIENT_ACKNOWLEDGE mode is properly configured
- Review queue consumer concurrency settings
WebSocket updates not working:
- Verify DynamoDB Streams is enabled on the table
- Check the Lambda function is triggered by stream events
- Validate API Gateway WebSocket configuration and IAM permissions
- Test the WebSocket connection using browser developer tools
Application scaling issues:
- Monitor App Runner metrics in CloudWatch
- Adjust auto scaling configuration based on traffic patterns
- Consider Amazon MQ broker capacity and upgrade if needed
- Review DynamoDB capacity settings and enable auto scaling

Clean up

To avoid incurring ongoing AWS charges, delete the resources you created in this walkthrough:

Delete the CDK stacks:

cdk destroy MessageProcessingStack
cdk destroy FrontendStack

Remove the App Runner service:

aws apprunner delete-service --service-arn <your-service-arn>

Delete the ECR repositories and container images.
Remove CloudWatch log groups if not set to auto-delete.
Delete S3 buckets used for frontend hosting.

Next steps

To extend this solution and add additional capabilities, consider the following enhancements:

Integrate AWS Step Functions for complex workflow orchestration
Add Amazon EventBridge for event-driven integrations
Implement Lambda for serverless message processing
Use Amazon SageMaker for ML-based priority classification
Build dashboards with Amazon QuickSight for business intelligence

Conclusion

This solution demonstrates how to build a production-ready priority-based message processing system using AWS managed services. By combining Amazon MQ priority queuing with DynamoDB real-time streams and App Runner serverless compute, you create a resilient architecture that intelligently handles messages based on business priorities.The implementation of application-level delays with priority bypass makes sure critical messages receive immediate attention, and the dual-layer retry mechanism provides maximum reliability. Real-time WebSocket updates keep users informed of processing status, creating a responsive and transparent system.To learn more about the services and patterns used in this solution, explore the following resources:

AWS documentation and guides:
Reference architectures and solutions:
- Distributed Load Testing on AWS
Video tutorials:
- AWS re:Invent 2019: Moving to event-driven architectures

About the authors

The State of Security Today: Setting the Stage for 2026

2025-11-18 Rapid7

Post Syndicated from Rapid7 original https://www.rapid7.com/blog/post/it-security-today-setting-stage-for-2026-predictions-webinar

As we close out 2025, one thing is clear: the security landscape is evolving faster than most organizations can keep up. From surging ransomware campaigns and AI-enhanced phishing to data extortion, geopolitical fallout, and gaps in cyber readiness, the challenges facing security teams today are as varied as they are relentless. But with complexity comes clarity and insight.

This year’s most significant breaches, breakthroughs, and behavioral shifts provide a critical lens through which we can view what’s next. That’s exactly what we’ll explore in our upcoming Security Predictions for 2026 webinar, where Rapid7’s experts will break down where we are now, what to expect next, and how organizations can proactively adapt.

Before we look ahead, let’s take stock of what defined 2025 and what it tells us about the state of cybersecurity today.

Ransomware: Same playbook, more precision

Ransomware remains one of the most consistent and costly threats facing organisations today, but the approach has shifted. According to Rapid7’s Q3 2025 Threat Landscape Report, data extortion continues to dominate, with groups increasingly focused on exfiltration and disruption rather than encryption alone. Over 80% of ransomware cases handled in Q3 involved data theft, often staged and timed to maximise leverage.

Threat actors like RansomHub, BlackSuit, NoEscape, and Scattered Spider continue to refine their operations. Many campaigns are multi-stage and collaborative, with Initial Access Brokers providing footholds that are later sold to ransomware operators. One common thread is a focus on identity and infrastructure abuse – attackers are compromising vSphere environments, exploiting misconfigurations in third-party platforms, and abusing legitimate remote access tools to move laterally before launching extortion phases.

These incidents increasingly target complex organizations with sprawling digital footprints. The result? Weeks of operational downtime, lost revenue, regulatory scrutiny, and enduring brand damage. In this landscape, ransomware is no longer just a malware problem – it’s a business continuity issue, a supply chain risk, and a board-level concern.

The offense is automated: AI goes to work

This year, we saw AI break through hype and land firmly in attackers’ toolkits. Tools like WormGPT, FraudGPT, and DarkBERT gave cybercriminals an entry point to generate convincing phishing emails, polymorphic malware, and credential-harvesting scripts, all without needing advanced coding skills.

In our AI Offense blog, we detailed how these tools lower the barrier to entry and amplify the volume and sophistication of social engineering campaigns. Pair that with deepfakes, cloned voices, and LLM-powered targeting, and security teams now face threats that are faster, cheaper, and harder to detect than ever before.

The takeaway? AI is not a future threat. It is here. And defenders must embrace its potential just as aggressively as attackers have.

The human factor: Still the weakest link

Despite improved tooling, attacker playbooks still rely heavily on people. Our recent exploration of evolving social engineering trends highlighted the rise of Microsoft Teams-based impersonation, remote access tool abuse such as Quick Assist, and multi-stage credential compromise.

The fallout has been widespread. From attacks on major UK retailers to multiple airline disruptions and critical public sector breaches, social engineering is no longer just email phishing. It is phone calls, voice cloning, fake calendars, and chat-based manipulation.

Training helps. But attackers are innovating faster than awareness campaigns can keep up. Security teams need to simulate these threats internally and invest in visibility across identity platforms, because credentials remain the crown jewels.

From awareness to action: Resilience as a mandate

A growing number of incidents in 2025 underscored the readiness gap in many organizations. Our recent blog on preparedness broke down the UK’s National Cyber Security Centre guidance urging companies to revisit their offline contingency planning, including printed IR protocols and analog communications in case digital systems are taken offline.

This call followed a sharp rise in high-impact events, with over 200 nationally significant cyber incidents recorded in the UK alone this year.

The lesson? Cyber resilience is not a nice to have. It is foundational. Detection, backup, and patching are essential, but so is building response plans that assume failure, simulate outages, and bring the entire business to the table.

Join us: Predicting what’s next in 2026

We’ll explore these trends and where they’re heading in much greater depth in our Security Predictions for 2026 webinar, taking place on December 10.

Rapid7’s experts will unpack:

Which attacker tactics are here to stay and which are on the rise
Where AI, regulation, and infrastructure gaps are creating new exposures
How defenders can better prioritise risk and operate in resource-constrained environments
What CISOs, SOC leaders, and engineers need to align on in 2026 to stay ahead

This is our biggest global webinar of the year, and it is designed to help security professionals at every level get proactive and stay ahead of what’s next.

Register now and join thousands of security professionals from around the world as we set the stage for 2026. Because when the threat landscape keeps shifting, your best defense is a head start.

Analogue 3D Review – New N64 Recreated in FPGA!

2025-11-18 LGR

Post Syndicated from LGR original https://www.youtube.com/watch?v=hgMAl0Rrhq4

[$] The current state of Linux architecture support

2025-11-18 daroc

Post Syndicated from daroc original https://lwn.net/Articles/1045363/

There have been several recent announcements about Linux distributions changing
the list of architectures they support, or adjusting how they build binaries for
some versions of those architectures.
Ubuntu introduced architecture variants, Fedora
considered dropping support for i686 but
reversed course after some pushback, and Debian developers

have discussed raising its architecture baseline for the upcoming

Debian 14
(“forky”).
Linux supports a large number of architectures, and it’s not always
clear where or by whom they are used.
With increasing concerns about diminishing support for legacy
architectures, it’s a good time to look at the overall state of architecture
support on Linux.

[$] Pouring packages with Homebrew

2025-11-18 jzb

Post Syndicated from jzb original https://lwn.net/Articles/1046236/

The Homebrew project is an
open-source package-management system that comes with a repository of
useful packages for Linux and macOS. Even though Linux distributions
have their own package management and repositories, Homebrew is often
used to obtain software that is not available in a distribution’s repository
or to install more current versions of projects than are available
from long-term-support (LTS) distributions. Homebrew 5.0.0,
released on November 12, 2025, expanded Linux support to include
64-bit Arm packages in addition to x86_64, and turned on concurrent
downloads by default to speed up package downloads.

Making PaperCut NG Observable with Zabbix

2025-11-18 Patrik Uytterhoeven

Post Syndicated from Patrik Uytterhoeven original https://blog.zabbix.com/making-papercut-ng-observable-with-zabbix/31244/

In most organizations, printing is an essential but often invisible service. When it works, nobody notices. When it fails, productivity stalls. That’s why monitoring your print environment is just as important as monitoring servers, databases, or network devices.

At Opensource ICT Solutions, we specialize in turning complex systems into observable services. One recent example is our integration of PaperCut NG with Zabbix. This allows IT teams to track the health of their print infrastructure in real-time — everything from server resources to individual printers and devices.

Why monitoring PaperCut matters

PaperCut NG does much more than queue print jobs. It enforces quotas, integrates with authentication systems, and manages fleets of devices. If the database runs out of connections, the disk fills up, or the license expires, users feel the impact instantly.

By integrating PaperCut with Zabbix, we make these risks visible long before they become business problems. The result is:

Proactive detection of printer errors, low toner, or license issues.
Capacity planning through trend analysis of disk usage, memory, and DB connections.
Unified visibility — PaperCut health checks appear right alongside servers, networks, and applications in Zabbix dashboards.

How the integration works

The magic happens through the PaperCut System Health API and Zabbix’s flexible data collection methods.

HTTP agent items

Zabbix fetches raw JSON data directly from PaperCut using an HTTP agent item, such as:

This single call provides a full snapshot of server health.

Dependent items + JSONPATH

Instead of hammering the API with multiple requests, we extract the needed fields using dependent items with JSONPATH preprocessing.

For example:

This design means one request can populate dozens of metrics, keeping monitoring both efficient and lightweight.

Calculated items

Some values aren’t directly available from PaperCut. In those cases, we create calculated items inside Zabbix.

For example, the percentage of active DB connections is derived as:

This allows us to set intelligent triggers like “DB connections > 90%” without requiring PaperCut to calculate it for us.

Low-level discovery (LLD) for devices and printers

Perhaps the most powerful part of this integration is automatic discovery.

Printer LLD → Queries /api/health/printers and creates items and triggers per printer. If a printer goes into Paper Jam or No Toner, Zabbix knows immediately.
Device LLD → Queries /api/health/devices and builds items dynamically for each discovered device, tracking states like OK, WARNING, or ERROR.

This ensures that new printers and devices are monitored automatically — no manual configuration required!

Why this matters

Bringing all of this together, the integration turns PaperCut NG into a fully observable service inside Zabbix.

Efficiency → One API call, dozens of metrics.
Scalability → Automatic discovery of printers and devices.
Robustness → Alerts and dashboards for licenses, resources, and print queues.

For IT teams, this means fewer surprises, faster troubleshooting, and more confidence in a service that often goes unnoticed until it fails.

Our expertise

This PaperCut integration is just one example of how we at Opensource ICT Solutions help organizations unlock the full potential of Zabbix. We don’t just install monitoring – we design intelligent, scalable integrations that make hidden systems visible. Whether it’s print management, databases, custom applications, or network devices, we know how to extend Zabbix to fit your environment and give you the insights that matter most.

Feel free to download our template and documentation for free from our GitHub: https://github.com/OpensourceICTSolutions/ZabbixPapercutNG

Want to make your business-critical systems truly observable? Let’s talk about how we can tailor Zabbix to your needs: [email protected]

The post Making PaperCut NG Observable with Zabbix appeared first on Zabbix Blog.

Security updates for Tuesday

2025-11-18 jzb

Post Syndicated from jzb original https://lwn.net/Articles/1046891/

Security updates have been issued by Debian (libwebsockets), Fedora (chromium and fvwm3), Mageia (apache, firefox, and postgresql13, postgresql15), Oracle (idm:DL1), Red Hat (bind, bind9.18, firefox, and openssl), SUSE (alloy, ghostscript, and openssl-1_0_0), and Ubuntu (ffmpeg and freeglut).

Single Light Beauty Portrait Challenge

2025-11-18 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=tSAqZxpF6WI

AI and Voter Engagement

2025-11-18 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/11/ai-and-voter-engagement.html

Social media has been a familiar, even mundane, part of life for nearly two decades. It can be easy to forget it was not always that way.

In 2008, social media was just emerging into the mainstream. Facebook reached 100 million users that summer. And a singular candidate was integrating social media into his political campaign: Barack Obama. His campaign’s use of social media was so bracingly innovative, so impactful, that it was viewed by journalist David Talbot and others as the strategy that enabled the first term Senator to win the White House.

Over the past few years, a new technology has become mainstream: AI. But still, no candidate has unlocked AI’s potential to revolutionize political campaigns. Americans have three more years to wait before casting their ballots in another Presidential election, but we can look at the 2026 midterms and examples from around the globe for signs of how that breakthrough might occur.

How Obama Did It

Rereading the contemporaneous reflections of the New York Times’ late media critic, David Carr, on Obama’s campaign reminds us of just how new social media felt in 2008. Carr positions it within a now-familiar lineage of revolutionary communications technologies from newspapers to radio to television to the internet.

The Obama campaign and administration demonstrated that social media was different from those earlier communications technologies, including the pre-social internet. Yes, increasing numbers of voters were getting their news from the internet, and content about the then-Senator sometimes made a splash by going viral. But those were still broadcast communications: one voice reaching many. Obama found ways to connect voters to each other.

In describing what social media revolutionized in campaigning, Carr quotes campaign vendor Blue State Digital’s Thomas Gensemer: “People will continue to expect a conversation, a two-way relationship that is a give and take.”

The Obama team made some earnest efforts to realize this vision. His transition team launched change.gov, the website where the campaign collected a “Citizen’s Briefing Book” of public comment. Later, his administration built We the People, an online petitioning platform.

But the lasting legacy of Obama’s 2008 campaign, as political scientists Hahrie Han and Elizabeth McKenna chronicled, was pioneering online “relational organizing.” This technique enlisted individuals as organizers to activate their friends in a self-perpetuating web of relationships.

Perhaps because of the Obama campaign’s close association with the method, relational organizing has been touted repeatedly as the linchpin of Democratic campaigns: in 2020, 2024, and today. But research by non-partisan groups like Turnout Nation and right-aligned groups like the Center for Campaign Innovation has also empirically validated the effectiveness of the technique for inspiring voter turnout within connected groups.

The Facebook of 2008 worked well for relational organizing. It gave users tools to connect and promote ideas to the people they know: college classmates, neighbors, friends from work or church. But the nature of social networking has changed since then.

For the past decade, according to Pew Research, Facebook use has stalled and lagged behind YouTube, while Reddit and TikTok have surged. These platforms are less useful for relational organizing, at least in the traditional sense. YouTube is organized more like broadcast television, where content creators produce content disseminated on their own channels in a largely one-way communication to their fans. Reddit gathers users worldwide in forums (subreddits) organized primarily on topical interest. The endless feed of TikTok’s “For You” page disseminates engaging content with little ideological or social commonality. None of these platforms shares the essential feature of Facebook c. 2008: an organizational structure that emphasizes direct connection to people that users have direct social influence over.

AI and Relational Organizing

Ideas and messages might spread virally through modern social channels, but they are not where you convince your friends to show up at a campaign rally. Today’s platforms are spaces for political hobbyism, where you express your political feelings and see others express theirs.

Relational organizing works when one person’s action inspires others to do this same. That’s inherently a chain of human-to-human connection. If my AI assistant inspires your AI assistant, no human notices and one’s vote changes. But key steps in the human chain can be assisted by AI. Tell your phone’s AI assistant to craft a personal message to one friend—or a hundred—and it can do it.

So if a campaign hits you at the right time with the right message, they might persuade you to task your AI assistant to ask your friends to donate or volunteer. The result can be something more than a form letter; it could be automatically drafted based on the entirety of your email or text correspondence with that friend. It could include references to your discussions of recent events, or past campaigns, or shared personal experiences. It could sound as authentic as if you’d written it from the heart, but scaled to everyone in your address book.

Research suggests that AI can generate and perform written political messaging about as well as humans. AI will surely play a tactical role in the 2026 midterm campaigns, and some candidates may even use it for relational organizing in this way.

(Artificial) Identity Politics

For AI to be truly transformative of politics, it must change the way campaigns work. And we are starting to see that in the US.

The earliest uses of AI in American political campaigns are, to be polite, uninspiring. Candidates viewed them as just another tool to optimize an endless stream of email and text message appeals, to ramp up political vitriol, to harvest data on voters and donors, or merely as a stunt.

Of course, we have seen the rampant production and spread of AI-powered deepfakes and misinformation. This is already impacting the key 2026 Senate races, which are likely to attract hundreds of millions of dollars in financing. Roy Cooper, Democratic candidate for US Senate from North Carolina, and Abdul El-Sayed, Democratic candidate for Senate from Michigan, were both targeted by viral deepfake attacks in recent months. This may reflect a growing trend in Donald Trump’s Republican party in the use of AI-generated imagery to build up GOP candidates and assail the opposition.

And yet, in the global elections of 2024, AI was used more memetically than deceptively. So far, conservative and far right parties seem to have adopted this most aggressively. The ongoing rise of Germany’s far-right populist AfD party has been credited to its use of AI to generate nostalgic and evocative (and, to many, offensive) campaign images, videos, and music and, seemingly as a result, they have dominated TikTok. Because most social platforms’ algorithms are tuned to reward media that generates an emotional response, this counts as a double use of AI: to generate content and to manipulate its distribution.

AI can also be used to generate politically useful, though artificial, identities. These identities can fulfill different roles than humans in campaigning and governance because they have differentiated traits. They can’t be imprisoned for speaking out against the state, can be positioned (legitimately or not) as unsusceptible to bribery, and can be forced to show up when humans will not.

In Venezuela, journalists have turned to AI avatars—artificial newsreaders—to report anonymously on issues that would otherwise elicit government retaliation. Albania recently “appointed” an AI to a ministerial post responsible for procurement, claiming that it would be less vulnerable to bribery than a human. In Virginia, both in 2024 and again this year, candidates have used AI avatars as artificial stand-ins for opponents that refused to debate them.

And yet, none of these examples, whether positive or negative, pursue the promise of the Obama campaign: to make voter engagement a “two-way conversation” on a massive scale.

The closest so far to fulfilling that vision anywhere in the world may be Japan’s new political party, Team Mirai. It started in 2024, when an independent Tokyo gubernatorial candidate, Anno Takahiro, used an AI avatar on YouTube to respond to 8,600 constituent questions over a seventeen-day continuous livestream. He collated hundreds of comments on his campaign manifesto into a revised policy platform. While he didn’t win his race, he shot up to a fifth place finish among a record 56 candidates.

Anno was RECENTLY elected to the upper house of the federal legislature as the founder of a new party with a 100 day plan to bring his vision of a “public listening AI” to the whole country. In the early stages of that plan, they’ve invested their share of Japan’s 32 billion yen in party grants—public subsidies for political parties—to hire engineers building digital civic infrastructure for Japan. They’ve already created platforms to provide transparency for party expenditures, and to use AI to make legislation in the Diet easy, and are meeting with engineers from US-based Jigsaw Labs (a Google company) to learn from international examples of how AI can be used to power participatory democracy.

Team Mirai has yet to prove that it can get a second member elected to the Japanese Diet, let alone to win substantial power, but they’re innovating and demonstrating new ways of using AI to give people a way to participate in politics that we believe is likely to spread.

Organizing with AI

AI could be used in the US in similar ways. Following American federalism’s longstanding model of “laboratories of democracy,” we expect the most aggressive campaign innovation to happen at the state and local level.

D.C. Mayor Muriel Bowser is partnering with MIT and Stanford labs to use the AI-based tool deliberation.io to capture wide scale public feedback in city policymaking about AI. Her administration said that using AI in this process allows “the District to better solicit public input to ensure a broad range of perspectives, identify common ground, and cultivate solutions that align with the public interest.”

It remains to be seen how central this will become to Bowser’s expected re-election campaign in 2026, but the technology has legitimate potential to be a prominent part of a broader program to rebuild trust in government. This is a trail blazed by Taiwan a decade ago. The vTaiwan initiative showed how digital tools like Pol.is, which uses machine learning to make sense of real time constituent feedback, can scale participation in democratic processes and radically improve trust in government. Similar AI listening processes have been used in Kentucky, France, and Germany.

Even if campaigns like Bowser’s don’t adopt this kind of AI-facilitated listening and dialog, expect it to be an increasingly prominent part of American public debate. Through a partnership with Jigsaw, Scott Rasmussen’s Napolitan Institute will use AI to elicit and synthesize the views of at least five Americans from every Congressional district in a project called “We the People.” Timed to coincide with the country’s 250th anniversary in 2026, expect the results to be promoted during the heat of the midterm campaign and to stoke interest in this kind of AI-assisted political sensemaking.

In the year where we celebrate the American republic’s semiquincentennial and continue a decade-long debate about whether or not Donald Trump and the Republican party remade in his image is fighting for the interests of the working class, representation will be on the ballot in 2026. Midterm election candidates will look for any way they can get an edge. For all the risks it poses to democracy, AI presents a real opportunity, too, for politicians to engage voters en masse while factoring their input into their platform and message. Technology isn’t going to turn an uninspiring candidate into Barack Obama, but it gives any aspirant to office the capability to try to realize the promise that swept him into office.

This essay was written with Nathan E. Sanders, and originally appeared in The Fulcrum.

Code Club Conference 2025: Creativity, community, and collaboration in Cambridge

2025-11-18 Sarah Lygoe

Post Syndicated from Sarah Lygoe original https://www.raspberrypi.org/blog/code-club-conference-2025-creativity-community-and-collaboration-in-cambridge/

Over the first weekend in November, members of the global Code Club community came together for two inspiring days of learning, creativity and connection. The annual event celebrates the people who make Code Clubs happen, allowing them to share ideas, explore new tools, and connect with others who help young people learn to code.

Educator at Code Club Conference attending a workshop

Exploring new technologies and inclusive teaching

Saturday began with hands-on sessions that brought creativity and technology together, exploring large language models and prompt engineering in Collaborating with LLMs and being a prompt boss. There was a lot of laughter from attendees about how large language models can produce confident but incorrect answers if given vague prompts, but many left inspired to experiment with new technologies in their own clubs.

“First time there and it was amazing. Met loads of great people and the amazing code club crew. I learnt loads of new skills around AI and Arduino.” – An attendee

Explore AI with creators in your club using our AI and machine learning projects.

Educator in a workshop, using a micro:bit

Collaboration that counts brought mentors together to discuss common challenges like volunteer retention, limited resources, and communication barriers. A crowd favourite was a shared volunteer toolkit, as well as event checklists and safeguarding resources.

“What I enjoyed most about the Clubs Conference was the opportunity to meet other facilitators and hear their stories — their successes and challenges. These conversations validated the volunteer work I do and reminded me of the impact of our clubs.” – An attendee

From the theatre sessions, you can watch Inclusive learning – Supporting Deaf learners in clubs which was both moving and insightful. We learnt that visual demonstrations, colour cues, and repetition were key to supporting Deaf learners. One memorable quote captured the spirit of the session:

“The children couldn’t speak to us. The children — we couldn’t hear their voices but by the eighth week we were able to hear their voices from what they built on the screen and it was echoing all around the classroom.” – Chidi Duru

Find out more about Chidi’s joy of coding alongside Deaf creators.

Learning and making across continents

The weekend’s talks showcased the reach of Code Club worldwide, with volunteers sharing their experiences of collaboration, sustainability, and creativity.

Watch Lessons from resourceful Code Clubs in India, which highlighted the ingenuity of young learners in under-resourced settings, while Hands-on with the Raspberry Pi Pico showcased low-cost, high-impact projects from Kenya and South Africa.

Speakers showed how community clubs adapt to local needs with unplugged activities and coding games inspired by cricket and kabaddi, empowering young people to solve real problems and celebrate curiosity through play. Excitingly, these new resources will be launching early next year; keep an eye on our activities page to be among the first to try them out!

Two attendees during a workshop working together

In the session Code Club Projects Unplugged, facilitators shared the idea of “hiding the vegetables” — hiding the learning inside the fun. Whether through a collaborative Scratch game, a micro:bit prop on stage, or a Pico gadget solving a real problem, this approach helps young people learn through play. They remember the joy, and the skills come naturally.

Learning beyond the screen

Teaching tech away from the computer screen shared a fun unplugged cybersecurity activity, The Chicken Shop, where learners role-play social engineering scenarios. Its success came from clear printed instructions, movement, humour, and strong debriefing.

Educators sharing ideas during a workshop

Learning coding outside the box explored how to engage young people with diverse learning styles while the Arduino crash course gave attendees a taste of physical computing and C++ programming in action. Workshops on AI, sustainability, and youth empowerment with Raspberry Pi computers and Unlocking Code Club resources helped club leaders discover practical ways to inspire problem-solving and make use of all the support available through Code Club.

The message from the sessions was clear: young people learn best when technology is human and hands-on.

Showcasing creativity with Coolest Projects

Coolest Projects – get involved! championed creativity over competition. Any young person under 18 can submit their project, including unfinished ideas. In-person and online showcases celebrate progress, imagination, and teamwork.

Speaking on the closing panel, Code Club leader Rachael Coultart talked about the importance of Coolest Projects as a rare platform for children to talk about their learning. She spoke about the experience of one particular child, explaining that it had made a powerful impression on her, saying:

“It had such a huge impact. I felt so proud of her and what she’d achieved. Afterwards, her parents told me that they felt it was the first time she had really been seen.”

What the community is taking forward

The community is united in its commitment to making Code Clubs inclusive, creative, and sustainable.

Context matters — projects that reflect local interests and challenges motivate young people to learn
Accessibility is central: visual cues, repetition, interpreters, and inclusive resources support every learner
Structure builds confidence; start with simple, guided activities before open-ended exploration
Volunteers are vital; shared toolkits, checklists, and training help them deliver engaging sessions
Celebration and affordability matter too: regular showcases and tools like the micro:bit, Pico, and Crumble keep computing fun, hands-on, and accessible for all

“Thank you. Clubs Conference is a highlight of my year.” – An attendee

Stay connected

If you want to stay up to date with the latest news, events and opportunities from Code Club, sign up for our newsletter and be part of the growing global community.

The post Code Club Conference 2025: Creativity, community, and collaboration in Cambridge appeared first on Raspberry Pi Foundation.

Arm Joins the NVIDIA NVLink Fusion Ecosystem

2025-11-18 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/arm-joins-the-nvidia-nvlink-fusion-ecosystem-vera/

Arm joins the NVIDIA NVLink Fusion ecosystem, giving custom Arm CPU designers an easier pathway to integrate NVIDIA GPUs

The post Arm Joins the NVIDIA NVLink Fusion Ecosystem appeared first on ServeTheHome.

Cloudflare outage on November 18, 2025

2025-11-18 Matthew Prince

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/18-november-2025-outage/

On 18 November 2025 at 11:20 UTC (all times in this blog are UTC), Cloudflare’s network began experiencing significant failures to deliver core network traffic. This showed up to Internet users trying to access our customers’ sites as an error page indicating a failure within Cloudflare’s network.

The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind. Instead, it was triggered by a change to one of our database systems’ permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.

The software running on these machines to route traffic across our network reads this feature file to keep our Bot Management system up to date with ever changing threats. The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.

After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file. Core traffic was largely flowing as normal by 14:30. We worked over the next few hours to mitigate increased load on various parts of our network as traffic rushed back online. As of 17:06 all systems at Cloudflare were functioning as normal.

We are sorry for the impact to our customers and to the Internet in general. Given Cloudflare’s importance in the Internet ecosystem any outage of any of our systems is unacceptable. That there was a period of time where our network was not able to route traffic is deeply painful to every member of our team. We know we let you down today.

This post is an in-depth recount of exactly what happened and what systems and processes failed. It is also the beginning, though not the end, of what we plan to do in order to make sure an outage like this will not happen again.

The outage

The chart below shows the volume of 5xx error HTTP status codes served by the Cloudflare network. Normally this should be very low, and it was right up until the start of the outage.

The volume prior to 11:20 is the expected baseline of 5xx errors observed across our network. The spike, and subsequent fluctuations, show our system failing due to loading the incorrect feature file. What’s notable is that our system would then recover for a period. This was very unusual behavior for an internal error.

The explanation was that the file was being generated every five minutes by a query running on a ClickHouse database cluster, which was being gradually updated to improve permissions management. Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network.

This fluctuation made it unclear what was happening as the entire system would recover and then fail again as sometimes good, sometimes bad configuration files were distributed to our network. Initially, this led us to believe this might be caused by an attack. Eventually, every ClickHouse node was generating the bad configuration file and the fluctuation stabilized in the failing state.

Errors continued until the underlying issue was identified and resolved starting at 14:30. We solved the problem by stopping the generation and propagation of the bad feature file and manually inserting a known good file into the feature file distribution queue. And then forcing a restart of our core proxy.

The remaining long tail in the chart above is our team restarting remaining services that had entered a bad state, with 5xx error code volume returning to normal at 17:06.

The following services were impacted:

Service / Product	Impact description
Core CDN and security services	HTTP 5xx status codes. The screenshot at the top of this post shows a typical error page delivered to end users.
Turnstile	Turnstile failed to load.
Workers KV	Workers KV returned a significantly elevated level of HTTP 5xx errors as requests to KV’s “front end” gateway failed due to the core proxy failing.
Dashboard	While the dashboard was mostly operational, most users were unable to log in due to Turnstile being unavailable on the login page.
Email Security	While email processing and delivery were unaffected, we observed a temporary loss of access to an IP reputation source which reduced spam-detection accuracy and prevented some new-domain-age detections from triggering, with no critical customer impact observed. We also saw failures in some Auto Move actions; all affected messages have been reviewed and remediated.
Access	Authentication failures were widespread for most users, beginning at the start of the incident and continuing until the rollback was initiated at 13:05. Any existing Access sessions were unaffected. All failed authentication attempts resulted in an error page, meaning none of these users ever reached the target application while authentication was failing. Successful logins during this period were correctly logged during this incident. Any Access configuration updates attempted at that time would have either failed outright or propagated very slowly. All configuration updates are now recovered.

As well as returning HTTP 5xx errors, we observed significant increases in latency of responses from our CDN during the impact period. This was due to large amounts of CPU being consumed by our debugging and observability systems, which automatically enhance uncaught errors with additional debugging information.

How Cloudflare processes requests, and how this went wrong today

Every request to Cloudflare takes a well-defined path through our network. It could be from a browser loading a webpage, a mobile app calling an API, or automated traffic from another service. These requests first terminate at our HTTP and TLS layer, then flow into our core proxy system (which we call FL for “Frontline”), and finally through Pingora, which performs cache lookups or fetches data from the origin if needed.

We previously shared more detail about how the core proxy works here.

As a request transits the core proxy, we run the various security and performance products available in our network. The proxy applies each customer’s unique configuration and settings, from enforcing WAF rules and DDoS protection to routing traffic to the Developer Platform and R2. It accomplishes this through a set of domain-specific modules that apply the configuration and policy rules to traffic transiting our proxy.

One of those modules, Bot Management, was the source of today’s outage.

Cloudflare’s Bot Management includes, among other systems, a machine learning model that we use to generate bot scores for every request traversing our network. Our customers use bot scores to control which bots are allowed to access their sites — or not.

The model takes as input a “feature” configuration file. A feature, in this context, is an individual trait used by the machine learning model to make a prediction about whether the request was automated or not. The feature configuration file is a collection of individual features.

This feature file is refreshed every few minutes and published to our entire network and allows us to react to variations in traffic flows across the Internet. It allows us to react to new types of bots and new bot attacks. So it’s critical that it is rolled out frequently and rapidly as bad actors change their tactics quickly.

A change in our underlying ClickHouse query behaviour (explained below) that generates this file caused it to have a large number of duplicate “feature” rows. This changed the size of the previously fixed-size feature configuration file, causing the bots module to trigger an error.

As a result, HTTP 5xx error codes were returned by the core proxy system that handles traffic processing for our customers, for any traffic that depended on the bots module. This also affected Workers KV and Access, which rely on the core proxy.

Unrelated to this incident, we were and are currently migrating our customer traffic to a new version of our proxy service, internally known as FL2. Both versions were affected by the issue, although the impact observed was different.

Customers deployed on the new FL2 proxy engine, observed HTTP 5xx errors. Customers on our old proxy engine, known as FL, did not see errors, but bot scores were not generated correctly, resulting in all traffic receiving a bot score of zero. Customers that had rules deployed to block bots would have seen large numbers of false positives. Customers who were not using our bot score in their rules did not see any impact.

Throwing us off and making us believe this might have been an attack was another apparent symptom we observed: Cloudflare’s status page went down. The status page is hosted completely off Cloudflare’s infrastructure with no dependencies on Cloudflare. While it turned out to be a coincidence, it led some of the team diagnosing the issue to believe that an attacker may be targeting both our systems as well as our status page. Visitors to the status page at that time were greeted by an error message:

In the internal incident chat room, we were concerned that this might be the continuation of the recent spate of high volume Aisuru DDoS attacks:

The query behaviour change

I mentioned above that a change in the underlying query behaviour resulted in the feature file containing a large number of duplicate rows. The database system in question uses ClickHouse’s software.

For context, it’s helpful to know how ClickHouse distributed queries work. A ClickHouse cluster consists of many shards. To query data from all shards, we have so-called distributed tables (powered by the table engine Distributed) in a database called default. The Distributed engine queries underlying tables in a database r0. The underlying tables are where data is stored on each shard of a ClickHouse cluster.

Queries to the distributed tables run through a shared system account. As part of efforts to improve our distributed queries security and reliability, there’s work being done to make them run under the initial user accounts instead.

Before today, ClickHouse users would only see the tables in the default database when querying table metadata from ClickHouse system tables such as system.tables or system.columns.

Since users already have implicit access to underlying tables in r0, we made a change at 11:05 to make this access explicit, so that users can see the metadata of these tables as well. By making sure that all distributed subqueries can run under the initial user, query limits and access grants can be evaluated in a more fine-grained manner, avoiding one bad subquery from a user affecting others.

The change explained above resulted in all users accessing accurate metadata about tables they have access to. Unfortunately, there were assumptions made in the past, that the list of columns returned by a query like this would only include the “default” database:

SELECT name, type FROM system.columns WHERE table = 'http_requests_features' order by name;

Note how the query does not filter for the database name. With us gradually rolling out the explicit grants to users of a given ClickHouse cluster, after the change at 11:05 the query above started returning “duplicates” of columns because those were for underlying tables stored in the r0 database.

This, unfortunately, was the type of query that was performed by the Bot Management feature file generation logic to construct each input “feature” for the file mentioned at the beginning of this section.

The query above would return a table of columns like the one displayed (simplified example):

However, as part of the additional permissions that were granted to the user, the response now contained all the metadata of the r0 schema effectively more than doubling the rows in the response ultimately affecting the number of rows (i.e. features) in the final file output.

Memory preallocation

Each module running on our proxy service has a number of limits in place to avoid unbounded memory consumption and to preallocate memory as a performance optimization. In this specific instance, the Bot Management system has a limit on the number of machine learning features that can be used at runtime. Currently that limit is set to 200, well above our current use of ~60 features. Again, the limit exists because for performance reasons we preallocate memory for the features.

When the bad file with more than 200 features was propagated to our servers, this limit was hit — resulting in the system panicking. The FL2 Rust code that makes the check and was the source of the unhandled error is shown below:

This resulted in the following panic which in turn resulted in a 5xx error:

thread fl2_worker_thread panicked: called Result::unwrap() on an Err value

Other impact during the incident

Other systems that rely on our core proxy were impacted during the incident. This included Workers KV and Cloudflare Access. The team was able to reduce the impact to these systems at 13:04, when a patch was made to Workers KV to bypass the core proxy. Subsequently, all downstream systems that rely on Workers KV (such as Access itself) observed a reduced error rate.

The Cloudflare Dashboard was also impacted due to both Workers KV being used internally and Cloudflare Turnstile being deployed as part of our login flow.

Turnstile was impacted by this outage, resulting in customers who did not have an active dashboard session being unable to log in. This showed up as reduced availability during two time periods: from 11:30 to 13:10, and between 14:40 and 15:30, as seen in the graph below.

The first period, from 11:30 to 13:10, was due to the impact to Workers KV, which some control plane and dashboard functions rely upon. This was restored at 13:10, when Workers KV bypassed the core proxy system.

The second period of impact to the dashboard occurred after restoring the feature configuration data. A backlog of login attempts began to overwhelm the dashboard. This backlog, in combination with retry attempts, resulted in elevated latency, reducing dashboard availability. Scaling control plane concurrency restored availability at approximately 15:30.

Remediation and follow-up steps

Now that our systems are back online and functioning normally, work has already begun on how we will harden them against failures like this in the future. In particular we are:

Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input
Enabling more global kill switches for features
Eliminating the ability for core dumps or other error reports to overwhelm system resources
Reviewing failure modes for error conditions across all core proxy modules

Today was Cloudflare’s worst outage since 2019. We’ve had outages that have made our dashboard unavailable. Some that have caused newer features to not be available for a period of time. But in the last 6+ years we’ve not had another outage that has caused the majority of core traffic to stop flowing through our network.

An outage like today is unacceptable. We’ve architected our systems to be highly resilient to failure to ensure traffic will always continue to flow. When we’ve had outages in the past it’s always led to us building new, more resilient systems.

On behalf of the entire team at Cloudflare, I would like to apologize for the pain we caused the Internet today.

Time (UTC)	Status	Description
11:05	Normal.	Database access control change deployed.
11:28	Impact starts.	Deployment reaches customer environments, first errors observed on customer HTTP traffic.
11:32-13:05	The team investigated elevated traffic levels and errors to Workers KV service.	The initial symptom appeared to be degraded Workers KV response rate causing downstream impact on other Cloudflare services. Mitigations such as traffic manipulation and account limiting were attempted to bring the Workers KV service back to normal operating levels. The first automated test detected the issue at 11:31 and manual investigation started at 11:32. The incident call was created at 11:35.
13:05	Workers KV and Cloudflare Access bypass implemented — impact reduced.	During investigation, we used internal system bypasses for Workers KV and Cloudflare Access so they fell back to a prior version of our core proxy. Although the issue was also present in prior versions of our proxy, the impact was smaller as described below.
13:37	Work focused on rollback of the Bot Management configuration file to a last-known-good version.	We were confident that the Bot Management configuration file was the trigger for the incident. Teams worked on ways to repair the service in multiple workstreams, with the fastest workstream a restore of a previous version of the file.
14:24	Stopped creation and propagation of new Bot Management configuration files.	We identified that the Bot Management module was the source of the 500 errors and that this was caused by a bad configuration file. We stopped automatic deployment of new Bot Management configuration files.
14:24	Test of new file complete.	We observed successful recovery using the old version of the configuration file and then focused on accelerating the fix globally.
14:30	Main impact resolved. Downstream impacted services started observing reduced errors.	A correct Bot Management configuration file was deployed globally and most services started operating correctly.
17:06	All services resolved. Impact ends.	All downstream services restarted and all operations fully restored.

Mastodon Stories for systemd v258

2025-11-18 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/mastodon-stories-for-systemd-v258.html

Already on Sep 17 we released systemd v258 into the wild.

In the weeks leading up to that release I have posted a series of
serieses of posts to Mastodon about key new features in this release,
under the #systemd258 hash
tag. It was my intention to post a link list here on this blog right
after completing that series, but I simply forgot! Hence, in case you
aren’t using Mastodon, but would like to read up, here’s a list of all
37 posts:

Post #1: systemctl start -v
Post #2: Home areas
Post #3: systemd-resolved delegate zones
Post #4: Foreign UID range
Post #5: /etc/hostname ??? wildcards
Post #6: Quota on /tmp/
Post #7: ConcurretnySoftMax= + ConcurrencyHardMax=
Post #8: Product UUID in ConditionHost=
Post #9: Context OSC terminal sequences
Post #10: uki-url Boot Loader Spec Type #1 fields
Post #11: rd.break= boot breakpoints
Post #12: Factory Reset Rework
Post #13: systemd-resolved DNS Configuration Change IPC Subscription API
Post #14: io.systemd.boot-entries.extra= SMBIOS Type #11 Key
Post #15: Bring Your Own Firmware
Post #16: userdb record aliases
Post #17: systemd-validatefs and its xattrs
Post #18: Offline Signing of Artifacts
Post #19: PAMName= in services hooked up to ask-password protocol
Post #20: x-systemd.graceful-option= mount option
Post #21: systemd-userdb-load-credentials.service
Post #22: systemd-vmspawn –grow-image=a
Post #23: systemd-notify –fork
Post #24: $TERM auto-discovery
Post #25: Rebooting/Powering off systemd-nspawn containers via hotkey
Post #26: ExecStart= | modifier
Post #27: systemctl reload reloads confexts
Post #28: Server side userdb filtering
Post #29: Quota on StateDirectory= and friends
Post #30: systemd-analyze unit-shell
Post #31: /etc/issue.d/ drop-in for AF_VSOCK CID
Post #32: fsverity in systemd-repart
Post #33: AcceptFileDescriptor= + PassPIDFD=
Post #34: Tab completion in interactive systemd-firstboot
Post #35: rd.systemd.pull= kernel command line option/Boot into tarball
Post #36: ConditionKernelModuleLoaded=
Post #37: systemd-analyze chid
Post #38: homectl list-signing-keys/get-signing-key/add-signing-key/remove-signing-key
Post #39: DDI Image Filters
Post #40: Android USB Debugging udev rules
Post #41: systemd-vmspawn’s –smbios11= switch
Post #42: $MAINPIDFDID + $MANAGERPIDFDID
Post #43: $DEBUG_INVOCATION=1 Respected by all systemd services
Post #44: LoaderDeviceURL EFI Variable and systemd.pull=’s origin kernel command line switch
Post #45: cgroupv1 removal
Post #46: ProtectHostname=private
Post #47: homectl adopt + homectl register
Post #48: systemd-machined Varlink APIs
Post #49: DeferTrigger and “lenient” job mode
Post #50: Automatic Removal of foreign UID owned delegate subgroups in the per-user service manager
Post #51: Per-user ask-password protocol
Post #52: PrivateUsers=full
Post #53: LoadCredentialEncrypted= in the per-user service manager
Post #54: dissect_image builtin in systemd-udevd
Post #55: BPF Delegation via Tokens

I intend to do a similar series of serieses of posts for the next systemd
release (v259), hence if you haven’t left tech Twitter for Mastodon yet, now is
the opportunity.

We intend to shorten the release cycle a bit for the future, and in
fact managed to tag v259-rc1 already yesterday, just 2 months after
v258. Hence, my series for v259 will begin soon, under the
#systemd259 hash tag.

In case you are interested, here is the corresponding blog story for
systemd v257,
and here for
v256.

New Python features

Template strings literal

Evaluation of type annotations

Improved Error Messages

Standard library

Lambda runtime changes

Python 3.14 features that are not available

Amazon Linux 2023

Using Python 3.14 in Lambda

AWS Management Console

Upgrading a function to Python 3.14

AWS Lambda container image

AWS Serverless Application Model (AWS SAM)

AWS Cloud Development Kit

Performance considerations

Conclusion

Prerequisites

Solution overview

Walkthrough

Step 1: Download or clone the repository

Step 2: Create the CloudFormation StackSet

Step 3: Update your EC2 instance profiles with proper permissions

Step 4: Verify the solution implementation

Verify preference configuration

Validate session functionality

Resources

Parameter groups

Conclusion

Solution overview

Prerequisites

Configure the infrastructure stack

Configure the message processing application

Build and deploy the Spring Boot application to App Runner

Set up real-time updates

Deploy the React application to Amazon S3 and Amazon CloudFront

Validate the solution

Automated testing script

Validation through the web interface

Amazon CloudWatch dashboards and alarms

Security considerations

Cost considerations

Troubleshooting

Clean up

Next steps

Conclusion

About the authors

Ransomware: Same playbook, more precision

The offense is automated: AI goes to work

The human factor: Still the weakest link

From awareness to action: Resilience as a mandate

Join us: Predicting what’s next in 2026

Why monitoring PaperCut matters

HTTP agent items

Dependent items + JSONPATH

Calculated items

Low-level discovery (LLD) for devices and printers

Why this matters

Our expertise

How Obama Did It

AI and Relational Organizing

(Artificial) Identity Politics

Organizing with AI

Exploring new technologies and inclusive teaching

Learning and making across continents

Learning beyond the screen

Showcasing creativity with Coolest Projects

What the community is taking forward

Stay connected

The outage

How Cloudflare processes requests, and how this went wrong today

The query behaviour change

Memory preallocation

Other impact during the incident

Remediation and follow-up steps

The collective thoughts of the interwebz