Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/1025208/

Security updates have been issued by AlmaLinux (kernel), Debian (chromium, gst-plugins-bad1.0, node-tar-fs, and ublock-origin), Gentoo (Emacs, File-Find-Rule, GStreamer, GStreamer Plugins, GTK+ 3, LibreOffice, Node.js, OpenImageIO, Python, PyPy, Qt, X.Org X server, XWayland, and YAML-LibYAML), Mageia (mariadb and roundcubemail), Red Hat (go-toolset:rhel8, golang, grafana, grafana-pcp, gstreamer1-plugins-bad-free, libxml2, libxslt, mod_security, nodejs:20, and perl-FCGI:0.78), Slackware (mozilla), SUSE (docker, docker-compose, iputils, kernel, libsoup, open-vm-tools, rabbitmq-server, rabbitmq-server313, wget, and yelp), and Ubuntu (libsoup2.4 and webkit2gtk).

Build a multi-Region AWS PrivateLink backed service with seamless failover

Post Syndicated from Madhav Vishnubhatta original https://aws.amazon.com/blogs/architecture/build-a-multi-region-aws-privatelink-backed-service-with-seamless-failover/

Global Payments Inc. is a leading worldwide provider of payment technology and software solutions, headquartered in Atlanta, Georgia. The company processes more than 75 billion transactions annually, serving more than 5 million merchant locations and nearly 2,000 financial institutions globally. Through its merger with TSYS in 2019, Global Payments expanded its capabilities beyond merchant acquiring services to include issuer processing solutions.

The company’s services now encompass ecommerce and omnichannel payments, business management software, customer engagement tools, and cloud-based solutions. Their commitment to technological innovation and customer service has positioned them as one of the largest financial technology companies globally, consistently ranking among the Fortune 500.

This post demonstrates how the Issuer Solutions business of Global Payments, as a service provider, implemented cross-Region failover for an AWS PrivateLink backed service exposed to their customers. Their solution enables failover to a secondary Region without customer coordination, reducing Recovery Time Objective (RTO).

AWS PrivateLink involves two key roles: service providers and service consumers. Service providers build, own, and manage endpoint services. Service consumers create and manage Amazon Virtual Private Cloud (Amazon VPC) endpoints that connect their VPC to an Amazon VPC endpoint service privately, without exposing the traffic to the public internet. Enterprises often build services in multiple AWS Regions for resilience. This approach requires endpoint services in two Regions, with service consumers creating VPC endpoints for each service.

The architecture uses Amazon Route 53 to resolve the service’s Fully Qualified Domain Name (FQDN) to the active Region’s service. Amazon Application Recovery Controller (ARC) is used to initiate the failover.

Customer requirements

Issuer Solutions had the following requirements for this implementation:

  • The ability for consumers to access the service privately, without traversing the public internet
  • Resilience to degradation in a single Region by allowing failover to a secondary Region
  • Independent failover without customer coordination
  • Reliable and simple failover process

Solution overview

The following simplified architecture diagram illustrates the connectivity and failover mechanisms. The exact service implementation of Issuer Solutions is beyond this post’s scope. For simplicity, we represent the service as a Network Load Balancer backed by Amazon Elastic Compute Cloud (Amazon EC2) instances.

AWS Architecture diagram showing primary and secondary regions with Route53 Private Hosted Zones, VPC endpoints, and PrivateLink integration, illustrating the connectivity and failover mechanisms.

The solution consists of the following key components:

  • The service provider deploys identical services in two Regions. The service is represented in this simplified version with a Network Load Balancer backed by EC2 instances. Each Region is independent and therefore resilient against failures in the other Region.
  • Services are exposed through PrivateLink as VPC endpoint services in each Region, allowing client connections without needing NAT gateways or internet gateways and keeping the traffic within the customers’ own private IP space.
  • The service provider authorizes the consumer AWS account to find the services using the VPC endpoint service names.
  • The service consumer uses the VPC PrivateLink endpoint service names to create VPC endpoints with Elastic Network Interfaces (ENIs) in two Availability Zones in each Region. The consumer does this in both consumer Regions, and each consumer Region has two sets of VPC endpoints: one for the primary service of the provider and one for the secondary service.
  • The service consumer creates a Route 53 private hosted zone in each of the two Regions they use, each with two alias records, with a simple routing policy, pointing to the VPC endpoints’ FQDNs in their own Regions. These two alias records are primary.example.com and secondary.example.com.
  • The service provider creates a Route 53 ARC cluster with routing controls for the primary and secondary Regions.
  • The service provider creates a private hosted zone with a failover record set for a consumer specific CNAME like custC.service.p.com that resolves to primary.example.com and secondary.example.com. These records are associated with health checks that are associated with Route 53 ARC routing controls.

As shown in the architecture diagram, it is not necessary for the provider’s and consumer’s Regions to be the same because AWS supports creating VPC endpoints in a Region different to that of the VPC endpoint service itself. Refer to AWS PrivateLink now supports cross-region connectivity for more details.

How the DNS resolution works

Each consumer Region has two Route 53 private hosted zones involved in DNS resolution in this approach: the service provider’s hosted zone and the service consumer’s hosted zone. Both hosted zones are associated with the VPC used by the service consumer. Here is how the DNS resolution works:

  1. When a client in the consumer VPC wants to reach the service, it uses the FQDN custC.service.p.com.
  2. The hosted zone in the service provider’s account resolves custC.service.p.com to either primary.example.com or secondary.example.com depending on the status of the health checks controlled by ARC. For now, let’s assume this resolves to primary.example.com.
  3. Next, primary.example.com resolves to the VPC endpoint FQDN com.amazonaws.vpce.<primary-region>.vpce-svcabc123 due to the hosted zone in the service consumer.

At the time of failover, the service provider will update the ARC health checks to turn off the primary control and turn on the secondary routing control. This causes the service provider’s hosted zone to resolve custC.service.p.com to secondary.example.com, which in turn resolves to the VPC endpoint FQDN in the secondary Region due to the service consumer’s hosted zone.

Considerations

With this setup, the service provider can fail over when they need to without the service consumer having to manually make any changes. This is especially useful for services that have multiple service consumers. Additionally, service consumers can make changes to the VPC endpoint as they see fit. They only need to update the hosted zone they manage to make sure that primary.example.com and secondary.example.com point to the correct VPC endpoints.

We used ARC in this post because it offers a robust solution for cross-Region failover with built-in static stability. ARC also avoids single points of failure in the failover logic by distributing it across multiple Regions.

This setup demonstrates an active-passive configuration where traffic is routed to a single Region at a time using the Route 53 failover routing policy. For an active-active approach, you can adapt this setup by employing alternative Route 53 policies such as weighted or latency-based routing, as detailed in Active-active and active-passive failover.

Code sample

We have created a GitHub repository with Terraform code to demonstrate this solution. The repository has the steps to set it up and test it.

Conclusion

This implementation of cross-Region failover by Global Payments Issuer Solutions for their PrivateLink backed service demonstrates a robust and flexible approach to providing high availability and resilience. By using AWS services such as PrivateLink, Route 53, and Route 53 ARC, they have created a solution that meets their key requirements. This architecture not only benefits Global Payments by allowing them to manage failovers efficiently, but also provides advantages to their service consumers. Customers maintain control over their own infrastructure while benefiting from seamless service continuity. As cloud architectures continue to evolve, solutions like this showcase the power of combining various AWS services to create highly available and fault-tolerant systems that meet complex business needs.

Clone our GitHub repository now and deploy this solution in your own AWS environment to try out this approach to cross-Region failover. Contact your AWS representative today to begin your journey toward enhanced business continuity.


About the Authors

How Stellantis streamlines floating license management with serverless orchestration on AWS

Post Syndicated from Göksel SARIKAYA original https://aws.amazon.com/blogs/architecture/how-stellantis-streamlines-floating-license-management-with-serverless-orchestration-on-aws/

This post is written by Goeksel Sarikaya, Senior Delivery Consultant at AWS, and Milosz Stawarski, Senior Software Architect at Stellantis.

Software licensing is a critical aspect of many organizations’ operations, with various models available to suit different needs. Two common types are named user licenses, which are assigned to specific individuals, and floating licenses, which can be shared among a pool of users. Some independent software vendors (ISVs) offer both options, whereas others might have limitations, particularly in cloud environments.

In this post, we explore a unique scenario where an ISV, unable to provide a floating license option for cloud usage, worked with Stellantis to develop an alternative solution. This approach, implemented with the ISV’s permission, treats named user licenses as if they were floating, automatically assigning and removing them based on the state of user workbench instances.

This solution is not intended to circumvent licensing terms or reduce costs at the expense of ISVs. Rather, it’s a collaborative approach to address specific customer needs when traditional floating licenses aren’t available. We will demonstrate how the solution uses serverless AWS services like Amazon EventBridge, AWS Lambda, Amazon DynamoDB, and AWS Systems Manager, keeping in mind that any similar implementation should only be pursued with explicit permission from the software vendor.

Overview of Stellantis

Stellantis N.V., born from the merger of FCA and PSA Group, leads the change towards software defined vehicles (SDV). As part of this transformation, AWS and Stellantis created the Virtual Engineering Workbench (VEW), a modular framework to develop, integrate, and test vehicle software in the cloud, ultimately connecting their vehicles to the cloud.

The VEW provides predefined environments tailored to specific use cases. These environments come fully equipped with the tools, integrated development environments (IDEs), and licensing necessary for developers to jumpstart their projects.

For more details on VEW, refer to Stellantis’ SDV transformation with the Virtual Engineering Workbench on AWS.

Overview of solution

As the number of developers and projects grew, Stellantis faced a challenge in managing the limited number of named user licenses for their software tools. The manual process of assigning and revoking licenses became increasingly time-consuming and inefficient, potentially hindering the agility and productivity of their development teams.

Stellantis and AWS tackled this challenge head-on by collaborating on an innovative, dynamic license management solution using AWS serverless services. This solution transforms the traditional named user license model into a more flexible floating license system, automatically assigning and revoking licenses based on the state of user workbench instances. The licenses and solution discussed in this post pertain solely to the use of standalone software tools such as those used in automotive domains. These do not involve sharing of user data or content when licenses are reused.

Before we dive into the detailed workflow of the solution, let’s examine the high-level architecture. The following diagram illustrates how various AWS services work together to create this efficient license management system.

Multi-region AWS license management architecture showing event-driven workflows between toolchain and user accounts with VEW workbench integration

Architecture

This architecture uses key AWS services such as EventBridge, Lambda, DynamoDB, and Systems Manager to create a scalable, serverless solution that significantly reduces administrative overhead and optimizes license utilization.

In the following sections, we explore each component of this architecture in detail, explaining how they interact to provide a seamless license management experience for Stellantis’ VEW.

In workbench accounts (user accounts)

The design is serverless and based on an event-driven approach. The workflow in the user accounts is as follows:

  1. Workbench instances are Amazon Elastic Compute Cloud (Amazon EC2). Their start and stop automatically sends AWS events.
  2. An EventBridge rule invokes a Lambda function when such an event occurs. This function checks the tags on the EC2 instance to distinguish workbenches from other EC2 instances. Two tags are important for identifying workbench instances: vew:workbench:ownerId and vew:workbench:type.
  3. The Lambda function creates a custom event with the following data: user-id, workbench-type, workbench-state, and instance-id, and sends this event to the default event bus.
  4. An EventBridge rule forwards the custom event to a custom event bus in the license server account.

In license server account

The following steps take place in the license server account:

  1. An EventBridge rule invokes a Lambda
  2. This function interacts with a DynamoDB table that stores a mapping of licensed products to users. The function does the following:
    1. Deduces the licensed products present in the workbench from the workbench type.
    2. For each licensed product, it verifies if the combination of product and user is already present in the DynamoDB
    3. If the workbench is starting:
      1. If the combination is already present, it increases the count of workbenches in the table for this item by 1.
      2. If the combination is not present, it creates a new item in the table (product, user-id, workbench-count, timestamp).
    4. If the workbench is stopping, it decreases the count of workbenches in the table for this item by 1. If the count becomes 0, the item is deleted.
  3. Any update to the DynamoDB table triggers another Lambda
  4. If the change in the table is a creation of a new entry or deletion of an entry, this function writes the current timestamp to a Systems Manager parameter in both cases. This is so that if no changes are detected in the database, we don’t unnecessarily run the xLC (License Client for related product) caller function.
  5. Another Lambda function is invoked every minute. It compares the timestamp written in the Systems Manager parameter indicating a DynamoDB item creation or deletion with the last time the function called the xLC CLI to assign users to a license.
  6. If the DynamoDB timestamp is earlier, the function stops. If the DynamoDB timestamp is later, the function queries the table for obtaining the user-id for each product.
  7. To maintain a comprehensive record of license assignment operations, you can enable data plane events for DynamoDB in AWS CloudTrail.
  8. For each licensed product, the function uses Run Command, a capability of Systems Manager, to invoke the xLC CLI API on the license server to assign named users to a license for a product. The function provides the list of users assigned to the product to the API. This updates the named user list on the license server—the list is completely overwritten, which includes adding new user IDs and removing ones that are no longer needed.

Benefits and key features

The solution offers the following benefits:

  • Automated license assignment and removal – Users are automatically assigned licenses when their workbench instances start, and licenses are returned to the pool when instances stop, providing efficient license utilization.
  • Scalable and serverless architecture – The solution is built on serverless AWS services, allowing it to scale seamlessly as the number of users and workbench instances grows, without the need for provisioning or managing servers.
  • Centralized license management – The license server account acts as a central hub for managing licenses across multiple workbench accounts, simplifying administration and providing a unified view of license usage.
  • Reduced administrative overhead – By automating the license assignment and removal process, the solution can significantly reduce the administrative burden associated with manual license management.
  • Optimized license utilization – Licenses are assigned only when needed and returned to the pool when no longer required, maximizing license availability and minimizing idle licenses.
  • Monitoring and metrics – The solution provides monitoring capabilities and license usage metrics, enabling better visibility and informed decision-making regarding license procurement and allocation.

Conclusion

By implementing this serverless solution, it is possible to transform a manual named user license management systems to an automated floating license system for software tools. The event-driven architecture and serverless components provide efficient and scalable license assignment and removal based on the workbench instance state.

This solution has streamlined the license management process, reducing administrative overhead and optimizing license utilization. It is now possible to provision software tools more efficiently, improving productivity and resource allocation across the organization. Additionally, the centralized license management and monitoring capabilities provide better visibility and control over license usage, enabling informed decision-making and cost optimization.

Overall, this AWS based floating license solution has empowered organizations to use software tools more effectively, while minimizing the operational burden associated with license management. For more serverless learning resources, visit Serverless Land.


About the authors

How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink

Post Syndicated from Nikos Tragaras, Raphaël Afanyan original https://aws.amazon.com/blogs/big-data/how-nexthink-built-real-time-alerts-with-amazon-managed-service-for-apache-flink/

This post is cowritten with Nikos Tragaras and Raphaël Afanyan from Nexthink.

In this post, we describe Nexthink’s journey as they implemented a new real-time alerting system using Amazon Managed Service for Apache Flink. We explore the architecture, the rationale behind key technology choices, and the Amazon Web Services (AWS) services that enabled a scalable and efficient solution.

Nexthink is a pioneering leader in digital employee experience (DEX). With a mission to empower IT teams and elevate workplace productivity, Nexthink’s Infinity platform offers real-time visibility into end user environments, actionable insights, and robust automation capabilities. By combining real-time analytics, proactive monitoring, and intelligent automation, Infinity enables organizations to deliver an optimal digital workspace.

In the past 5 years, Nexthink completed its transformation into a fully-fledged cloud platform that processes trillions of events per day, reaching over 5 GB per second of aggregated throughput. Internally, Infinity comprises more than 300 microservices that use the power of Apache Kafka through Amazon Managed Service for Apache Kafka (Amazon MSK) for data ingestion and intra-service communication. The Nexthink ecosystem includes several hundreds of Micronaut-based Java microservices deployed in Amazon Elastic Kubernetes Service (Amazon EKS). The vast majority of microservices interact with Kafka through the Kafka Streams framework.

Nexthink alerting system

To help you understand Nexthink’s journey toward a new real-time alerting solution, we begin by examining the existing system and the evolving requirements that led them to seek a new solution.

Nexthink’s existing alerting system provides near real-time notifications, helping users detect and respond to critical events quickly. While effective, this system has limitations in scalability, flexibility, and real-time processing capabilities.

Nexthink gathers telemetry data from thousands of customers’ laptops covering CPU usage, memory, software versions, network performance, and more. Amazon MSK and ClickHouse serve as the backbone for this data pipeline. All endpoint data is ingested in Kafka multi-tenant topics, which are processed and finally stored in a ClickHouse database.

Using the current alerting system, clients can define monitoring rules in Nexthink Query Language (NQL), which are evaluated in near real time by polling the database every 15 minutes. Alerts are triggered when anomalies are detected against client-defined thresholds or long-term baselines. This process is illustrated in the following architecture diagram.

Originally, database-polling allowed great flexibility in the evaluation of complex alerts. However, this approach placed heavy stress on the database. As the company grew and supported larger customers with more endpoints and monitors, the database experienced increasingly heavy loads.

Evolution to a new use-case: Real-time alerts

As Nexthink expanded its data collection to include virtual desktop infrastructure (VDI), the need for real-time alerting became even more critical. Unlike traditional endpoints, such as laptops, where events are gathered every 5 minutes, VDI data is ingested every 30 seconds—significantly increasing the volume and frequency of data. The existing architecture relied on database polling to evaluate alerts, running at a 15-minute interval. This approach was inadequate for the new VDI use case, where alerts needed to be evaluated in near real time on messages arriving every 30 seconds. Merely increasing the polling frequency wasn’t a viable option because it would place excessive load on the database, leading to performance bottlenecks and scalability challenges. To meet these new demands efficiently, we shifted to real-time alert evaluation directly on Kafka topics.

Technology options

As we evaluated solutions for our real-time alerting system, we analyzed two main technology options: Apache Kafka Streams and Apache Flink. Each option had benefits and limitations that needed to be considered.

All Nexthink microservices up to that point integrated with Kafka using Apache Kafka Streams. We’ve observed in practice multiple benefits:

  • Lightweight and seamless integration. No need for additional infrastructure.
  • Low latency using RocksDB as a local key-value store.
  • Team expertise. Nexthink teams have been writing microservices with Kafka-streams for a long time and feel very comfortable using it.

In some use cases however, we found that there were important limitations:

  • Scalability – Scalability was constrained by the tight coupling between parallelism of microservices and the number of partitions in Kafka topics. Many microservices had already scaled out to match the partition count of the topics they consumed, limiting their ability to scale further. One potential solution was increasing the partition count. However, this approach introduced significant operational overhead, especially with microservices consuming topics owned by other domains. It required rebalancing the entire Kafka cluster and needed coordination across multiple teams. Additionally, such modifications impacted downstream services, requiring careful reconfiguration of stateful processing. The alternative approach would be to introduce intermediate topics to redistribute workload, but this would add complexity to the data pipeline and increase resource consumption on Kafka. These challenges made it clear that a more flexible and scalable approach was needed.
  • State management – Services that needed to create large K-tables in memory had an increased startup time. Also, in cases where the internal state was large in volume, we found that it applied significant load to the Kafka cluster during the creation of the internal state.
  • Late event processing – In windowing operations, late events had to be managed manually with techniques that complexified the codebase.

Seeking an alternative that could help us overcome the challenges posed by our current system, we decided to evaluate Flink. Its robust streaming capabilities, scalability, and flexibility made it an excellent choice for building real-time alerting systems based on Kafka topics. Several advantages made Flink particularly appealing:

  • Native integration with Kafka – Flink offers native connectors for Kafka, which is a central component in the Nexthink ecosystem.
  • Event-time processing and support for late events – Flink allows messages to be processed based on the event time (that is, when the event actually occurred) even if they arrive out of order. This feature is crucial for real-time alerts because it guarantees their accuracy.
  • Scalability – Flink’s distributed architecture allows it to scale horizontally independently from the number of partitions in the Kafka topics. This feature weighed a lot in our decision-making because the dependence on the number of partitions was a strong limitation in our platform up to this point.
  • Fault tolerance – Flink supports checkpoints, allowing managed state to be persisted and ensuring consistent recovery in case of failures. Unlike Kafka Streams, which relies on Kafka itself for long-term state persistence (adding extra load to the cluster), Flink’s checkpointing mechanism operates independently and runs out-of-band, minimizing the impact on Kafka while providing efficient state management.
  • Amazon Managed Service for Apache Flink – Amazon Managed Service for Apache Flink is a fully managed service that simplifies the deployment, scaling, and management of Flink applications for real-time data processing. By eliminating the operational complexities of managing Flink clusters, AWS enables organizations to focus on building and running real-time analytics and event-driven applications efficiently. Amazon Managed Service for Apache Flink provided us with significant flexibility. It streamlined our evaluation process, which meant we could quickly set up a proof-of-concept environment without getting into the complexities of managing an internal Flink cluster. Moreover, by reducing the overhead of cluster management, it made Flink a viable technology choice and accelerated our delivery timeline.

Solution

After careful evaluation of both options, we chose Apache Flink as our solution due to its superior scalability, robust event-time processing, and efficient state management capabilities. Here’s how we implemented our new real-time alerting system.

The following diagram is the solution architecture.

The first use case was to detect issues with VDI. However, our intention was to build a generic solution that would give us the option to onboard in the future existing use cases currently implemented through polling. We wanted to maintain a common way of configuring monitoring conditions and allow alert evaluation both with polling as well as in real time, depending on the type of device being monitored.

This solution comprises multiple parts:

  • Monitor configuration – Using Nexthink Query Language (NQL), the alerts administrator defines a monitor that specifies, for example:
    • Data source – VDI events
    • Time window – Every 30 seconds
    • Metric – Average network latency, grouped by desktop pool
    • Trigger condition(s) – Latency exceeding 300 ms for a continual period of 5 minutes

This monitor configuration is then stored in an internally developed document store and propagated downstream in a Kafka topic.

  • Data processing using Generic Stream Services– The Nexthink Collector, an agent installed on endpoints, captures and reports various kinds of activities from the VDI endpoints where it’s installed. These events are forwarded to Amazon MSK in one of Nexthink’s production virtual private clouds (VPCs) and are consumed by Java microservices running on Amazon EKS belonging to several domains within Nexthink

One of them is Generic Stream Services, a system that processes the collected events and aggregates them in buckets of 30 seconds. This component works as self-service for all the feature teams in Nexthink and can query and aggregate data from an NQL query. This way, we were able to keep a unified user experience on monitor configuration using NQL, regardless of how alerts were evaluated. This component is broken down into two services:

    • GS processor – Consumes raw VDI session events and applies initial processing
    • GS aggregator – Groups and aggregates the data according to the monitor configuration
  • Real-time monitoring using Flink – Static threshold alerting and seasonal change detection, which identifies variations in data that follow a recurring pattern over time, are the two types of detection that we offer for VDI issues. The system splits the processing between two applications:
    • Baseline application – Calculates statistical baselines with seasonality using time-of-day anomaly algorithm. For example, the latency by VDI client location or the CPU queue length of a desktop pool.
    • Alert application – Generates alerts based on user-defined thresholds when the unexpected values don’t change over time or dynamic thresholds based on baselines, which trigger when a metric deviates from an expected pattern.

The following diagram illustrates how we join VDI metrics with monitor configurations, aggregate data using sliding time windows, and evaluate threshold rules, all within Apache Flink. From this process, alerts are generated and are then grouped and filtered before being processed further by the consumers of alerts.

  • Alert processing and notifications – After an alert is triggered (when a threshold is exceeded) or recovered (when a metric returns to normal levels), the system will assess their impact to prioritize response through the impact processing module. Alerts are then consumed by notification services that deliver messages through emails or webhooks. The alert and impact data are then ingested into a time series database.

Benefits of the new architecture

One of the key advantages of adopting a streaming-based approach over polling was its ease of configuration and management, especially for a small team of three engineers. There was no need for cluster management, so all we needed to do was to provision the service and start coding.

Given our prior experience with Kafka and Kafka Streams and combined with the simplicity of a managed service, we were able to quickly develop and deploy a new alerting system without the overhead of complex infrastructure setup. We used Amazon Managed Service for Apache Flink to spin up a proof of concept within a few hours, which meant the team could focus on defining the business logic without having concerns related to cluster management.

Initially, we were concerned about the challenges of joining multiple Kafka topics. With our previous Kafka Streams implementation, joined topics required identical partition keys, a constraint known as co-partitioning. This created an inflexible architecture, particularly when integrating topics across different business domains. Each domain naturally had its own optimal partitioning strategy, forcing difficult compromises.

Amazon Managed Service for Apache Flink solved this problem through its internal data partitioning capabilities. Although Flink still incurs some network traffic when redistributing data across the cluster during joins, the overhead is practically negligible. The resulting architecture is both more scalable (because topics can be scaled independently based on their specific throughput requirements) and easier to maintain without complex partition alignment concerns.

This significantly improved our ability to detect and respond to VDI performance degradations in real time while keeping our architecture clean and efficient.

Lessons learnt

As with any new technology, adopting Flink for real-time processing came with its own set of challenges and insights.

One of the primary difficulties we encountered was observing Flink’s internal state. Unlike Kafka Streams, where the internal state is by default backed by a Kafka topic from which its content can be visualized, Flink’s architecture makes it inherently difficult to inspect what is happening inside a running job. This required us to invest in robust logging and monitoring strategies to better understand what is happening during the execution and debug issues effectively.

Another critical insight emerged around late event handling—specifically, managing events with timestamps that fall within a time-window’s boundaries but arrive after that window has closed. Amazon Managed Service for Apache Flink addresses this challenge through its built-in watermarking mechanism. A watermark is a timestamp-based threshold that indicates when Flink should consider all events before a specific time to have arrived. This allows the system to make informed decisions about when to process time-based operations like window aggregations. Watermarks flow through the streaming pipeline, enabling Flink to track the progress of event time processing even with out-of-order events.

Although watermarks provide a mechanism to manage late data, they introduce challenges when dealing with multiple input streams operating at different speeds. Watermarks work well when processing events from a single source but can become problematic when joining streams with varying velocities. This is because they can lead to unintended delays or premature data discards. For example, a slow stream can hold back processing across the entire pipeline, and an idle stream might cause premature window closing. Our implementation required careful tuning of watermark strategies and allowable lateness parameters to balance processing timeliness with data completeness.

Our transition from Kafka Streams to Apache Flink proved smoother than initially anticipated. Teams with Java backgrounds and prior experience with Kafka Streams found Flink’s programming model intuitive and easy to use. The DataStream API offers familiar concepts and patterns, and Flink’s more advanced features could be adopted incrementally as needed. This gradual learning curve gave our developers the flexibility to become productive quickly, focusing first on core stream processing tasks before moving on to more advanced concepts like state management and late event processing.

The future of Flink in Nexthink

Real-time alerting is now deployed to production and available to our clients. A major success of this project was the fact that we successfully introduced a technology as an alternative to Kafka streams, with very little management requirements, guaranteed scalability, data-management flexibility, and comparable cost.

The impact on the Nexthink alerting system was significant because we no longer have a single evaluating alert through database polling. Therefore, we’re already assessing the timeframe for onboarding other alerting use cases to real-time evaluation with Flink. This will alleviate database load and will also provide more accuracy on the alert triggering.

Yet the impact of Flink isn’t limited to the Nexthink alerting system. We now have a proven production-ready alternative for services that are limited in terms of scalability due to the number of partitions of the topics they are consuming. Thus, we’re actively evaluating the option to convert more services to Flink to allow them to scale out more flexibly.

Conclusion

Amazon Managed Service for Apache Flink has been transformative for our real-time alerting system at Nexthink. By handling the complex infrastructure management, AWS enabled our team to deploy a sophisticated streaming solution in less than a month, keeping our focus on delivering business value rather than managing Flink clusters.

The capabilities of Flink have proven it to be more than an alternative to Kafka Streams. It’s become a compelling first choice for both new projects and existing feature refactoring. Windowed processing, late event management, and stateful streaming operations have made complex use cases remarkably straightforward to implement. As our development teams continue to explore Flink’s potential, we’re increasingly confident that it will play a central role in Nexthink’s real-time data processing architecture moving forward.

To get started with Amazon Managed Service for Apache Flink, explore the getting started resources and the hands-on workshop. To learn more about Nexthink’s broader journey with AWS, visit the blog post on Nexthink’s MSK-based architecture.


About the authors

Nikos Tragaras is a Principal Software Architect at Nexthink with around two decades of experience in building distributed systems, from traditional architectures to modern cloud-native platforms. He has worked extensively with streaming technologies, focusing on reliability and performance at scale. Passionate about programming, he enjoys building clean solutions to complex engineering problems

Raphaël Afanyan is a Software Engineer and Tech Lead of the Alerts team at Nexthink. Over the years, he has worked on designing and scaling data processing systems and played a key role in building Nexthink’s alerting platform. He now collaborates across teams to bring innovative product ideas to life, from backend architecture to polished user interfaces.

Simone Pomata is a Senior Solutions Architect at AWS. He has worked enthusiastically in the tech industry for more than 10 years. At AWS, he helps customers succeed in building new technologies every day.

Subham Rakshit is a Senior Streaming Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build streaming architectures so they can get value from analyzing their streaming data. His two little daughters keep him occupied most of the time outside work, and he loves solving jigsaw puzzles with them. Connect with him on LinkedIn.

Lorenzo Nicora works as a Senior Streaming Solutions Architect at AWS, helping customers across EMEA. He has been building cloud-centered, data-intensive systems for over 25 years, working across industries both through consultancies and product companies. He has used open source technologies extensively and contributed to several projects, including Apache Flink.

Bringing data science to life for K–12 students with the ‘API Can Code’ curriculum

Post Syndicated from Diana Kirby original https://www.raspberrypi.org/blog/bringing-data-science-to-life-for-k-12-students-with-the-api-can-code-curriculum/

As data and data-driven technologies become a bigger part of everyday life, it’s more important than ever to make sure that young people are given the chance to learn data science concepts and skills.

In our April research seminar, David Weintrop, Rotem Israel-Fishelson, and Peter Moon from the University of Maryland introduced API Can Code, a data science curriculum designed with high school students for high school students. Their talk explored how their innovative work uses real-world data and students’ own experiences and interests to create meaningful, authentic learning experiences in data science.

Quick note for educators: Are you interested in joining our free, exploratory data science education workshop for teachers on 10 July 2025 in Cambridge, UK? Then find out the details here.

David started by explaining the motivation behind the API Can Code project. The team’s goal was not to turn students into future data scientists, but to offer students the data literacy they need to explore and critically engage with a data-driven world. 

The work was also guided by a shared view among leading teachers’ organisations that data science should be taught across all subjects in the K–12 curriculum. It also draws on strong research showing that when educational experiences connect with students’ own lives and interests, it leads to deeper engagement and better learning outcomes.

Reviewing the landscape

To prepare for the design of the curriculum, David, Rotem, and Peter wanted to understand what data science education options already exist for K–12 students. Rotem described how they compared four major K–12 data science curricula and examined different aspects, such as the topics they covered and the datasets they used. Their findings showed that many datasets were quite small in size, and that the datasets used were not always about topics that students were interested in.

A classroom of young learners and a teacher at laptops

The team also looked at 30 data science tools used across different K–12 platforms and analysed what each could do. They found that tools varied in how effective they were and that many lacked accessibility features to support students with diverse learning needs. 

This analysis helped to refine the team’s objective: to create a data science curriculum that students find interesting and that is informed by their values and voices.

Participatory design

To work towards this goal, the team used a methodology called participatory design. This is an approach that actively involves the end users — in this case, high school students — in the design process. During several in-person sessions with 28 students aged 15 to 18 years old, the researchers facilitated low-tech, hands-on activities exploring the students’ identities and interests and how they think about data.

One activity, Empathy Map, involved students working together to create a persona representing a student in their school. They were asked to describe the persona’s daily life, interests, and concerns about technology and data:

The students’ involvement in the design process gave the team a better understanding of young people’s views and interests, which helped create the design of the API Can Code curriculum.

API Can Code: three units, three key tools

Peter provided an overview of the API Can Code curriculum. It follows a three-unit flow covering different concepts and tools in each unit:

  1. Unit 1 introduces students to different types of data and data science terminology. The unit explores the role of data in the students’ daily lives, how use and misuse of data can affect them, different ways of collecting and presenting data, and how to evaluate databases for aspects such as size, recency, and trustworthiness. It also introduces them to RapidAPI, a hub that connects to a wide range of APIs from different providers, allowing students to access real-world data such as Zillow housing prices or Spotify music data.
  2. Unit 2 covers the computing skills used in data science, including the use of programming tools to run efficient data science techniques. Students learn to use EduBlocks, a block-based programming environment where students can draw in JSON files from RapidAPI datasets, and process and filter data without needing a lot of text-based programming skills. The students also compare this approach with manual data processing, which they discover is very slow.
  3. Unit 3 focuses on data analysis, visualisation, and interpretation. Students use CODAP, a web-based interactive data science tool, to calculate summary statistics, create graphs, and perform analyses. CODAP is a user-friendly but powerful platform, making it perfect for students to analyse and visualise their data sets. Students also practise interpreting pre-made graphs and the graphs and statistics that they are creating.

Peter described an example activity carried out by the students, showing how these three units flow together and build both technical skills and an understanding of the real-world uses of data science. Students were tasked with analysing a dataset from Zillow, a property website, to explore the question “How much does a house in my neighbourhood cost?” The images below show the process the students followed, which uses the data science skills and tools from all three units of the curriculum.

Interest-driven learning in action

A central tenet of API Can Code is that students should explore data that matters to them. A diverse range of student interests was identified during the design work, and the curriculum uses these areas of interest, such as music, movies, sports, and animals, throughout the lessons.

The curriculum also features an open-ended final project, where students can choose a research question that is important to them and their lives, and answer it using data science skills.

The team shared two examples of memorable final projects. In one, a student set out to answer the question “Is Jhené Aiko a star?” The student found a publicly available dataset through an API provided by Deezer, a music streaming platform. She wrote a program that retrieved data on the artist’s longevity and collaborations, analysed the data, and concluded that Aiko is indeed a star. What stood out about this project wasn’t just the fact that the student independently defined stardom and answered their research question using real data, but that this was a truly personal, interest-driven project. David noted that the researchers could never have come up with this activity, since they had never previously heard of Jhené Aiko!

Jhené Aiko, an R&B singer-songwriter
Jhené Aiko, an R&B singer-songwriter 
(Photo by Charito Yap, licensed under CC BY-ND 2.0)

Another student’s project analysed data about housing in Washington DC to answer the question “Which ward in DC has the most affordable houses?” Rotem explained that this student was motivated by her family thinking about moving away from the city. She wanted to use her project to persuade her parents to stay by identifying the most affordable ward in DC that they could move to. She was excited by the outcome of her project, and she presented her findings to other students and her parents.

These projects underscore the power of personally important data science projects driven by students’ interests. When students care about the questions they are exploring, they’re more invested in the process and more likely to keep using the skills and concepts they learn.

Resources

API Can Code is available online and completely free to use. Teachers can access lesson plans, tutorial videos, assessment rubrics, and more from the curriculum’s website https://apicancode.umd.edu/. The site also provides resources to support students, including example programs and glossaries.

Join our next seminar

In our current seminar series, we’re exploring teaching about AI and data science. Join us at our next seminar on Tuesday, 17 June from 17:00 to 18:30 BST to hear Netta Iivari (University of Oulu) introduce transformative agency and its importance for children’s computing education in the age of AI.

To sign up and take part in our research seminars, click below:

You can also view the schedule of our upcoming seminars, and catch up on past seminars on our previous seminars and recordings page.

The post Bringing data science to life for K–12 students with the ‘API Can Code’ curriculum appeared first on Raspberry Pi Foundation.

Защо е малко вероятно в България да спечели демократичен президент

Post Syndicated from Светла Енчева original https://www.toest.bg/zashto-e-malko-veroyatno-v-bulgaria-da-specheli-demokratichen-prezident/

Защо е малко вероятно в България да спечели демократичен президент

През есента на 2026 г. в България предстоят избори за президент. Засега никоя политическа сила не е посочила своя кандидат за държавен глава, нито има ясна визия какви да са основните послания в кампанията. Не се знаят и конфигурациите, които биха застанали зад един или друг претендент. При толкова много неизвестни нека поне опитаме

да скицираме електоралния пейзаж.

Претендентите за поста може да се разделят с оглед на електоралната си база в три основни групи – две големи и една по-малка.

В първата са потенциалните евроскептични и повече или по-малко проруски кандидати. Там напливът е най-голям – „Възраждане“, „Величие“, БСП, ИТН, МЕЧ и всевъзможни „патриоти“. И най-важното – евентуален кандидат, ползващ се с подкрепата на настоящия президент, който е политикът с най-висок рейтинг в България.

Във втората ще е фаворитът на ГЕРБ/Пеевски. Това може да е лично Бойко Борисов или някой друг. В тази група не се очаква конкуренция, поне на видимо равнище. Идеологическият ѝ облик е аморфен – и Борисов, и Пеевски го играят в зависимост от конкретната ситуация и евроатлантици, и евроскептици, и почитатели на „силната ръка“. 

В третата група са идентифициращите се като демократични и проевропейски партии – „Продължаваме промяната“ и „Демократична България“ (коалицията между ДСБ и „Да, България“). До април 2024 г. и Зелено движение беше част от ДБ, но днес то по-скоро се държи като активистка група. Не участва в последните парламентарни избори и няма яснота ще вземе ли някакво отношение към президентските.

От мравешки и от птичи поглед

В редиците на ПП–ДБ и техните симпатизанти основната интрига е дали партиите ще се обединят около общ кандидат-президент, или различията и амбициите им ще надделеят.

Следващият по важност въпрос е какъв да бъде евентуалният консенсусен кандидат. „Нужно е лице, което вдъхва надежда, интегрира отвъд партийното ядро, комуникира ясно европейската посока и не се страхува да назове опонентите си“, казва Емилия Милчева.

Способността на кандидата да се хареса отвъд партийните ядра е особено важна, защото понастоящем ПП–ДБ могат да разчитат на подкрепата на има-няма 15% от гласоподавателите. Макар идеята за предварителни избори да звучи демократично, по всяка вероятност в тях биха се включили само твърдите електорати на партиите от коалицията. И няма гаранция, че спечелилият ще може да привлече широка периферия избиратели.

Ала докато в редиците на крехката коалиция ври и кипи, как изглежда ситуацията в сравнителен европейски план?

През 2025 г. се проведоха президентски избори в Румъния и Полша. И в двете страни кандидатите, стигнали до втория тур, бяха демократичен, който е и кмет на столицата, и евроскептичен/проруски.

В Румъния спечели демократичният кандидат. Това обаче стана от втория път – в края на 2024 г. резултатите от първия вот за президент, спечелен от евроскептичен кандидат, бяха отменени заради разузнавателни данни за намеса на Русия в изборния процес. В Полша спечели, макар и с малко, проруският кандидат.

И в Румъния, и в Полша, за разлика от България, има силни продемократични и проевропейски обществени настроения, довели до масови протести. В Румъния целта на протестите беше запазването на геополитическата ориентация на страната. В Полша имаше масово недоволство срещу консервативни реформи по времето на управлението на предишния премиер Ярослав Качински, като например забраната на абортите.

В България масови протести в името на демократични ценности на практика няма.

Демонстрациите против премахването на светския характер на българското образование например събраха няколко хиляди души – немалко, но далеч от масово. Недоволството срещу забраната на т.нар. ЛГБТ пропаганда в училище беше още по-бутиково, а насилието над жени мобилизира обществената енергия най-вече при конкретни случаи и бързо отшумява.

Борбата против корупцията и за правосъдна реформа не е непременно свързана с демократичните ценности. Тя може да се води и в името на установяване на управление на „силнатя ръка“, която „да сложи всичко в ред“. Друг въпрос е, че и антикорупцията вече не катализира особена обществена енергия.

Що се отнася до службите,

българското разузнаване предприе мерки срещу руското влияние в страната през 2022 г., когато на власт беше правителството в оставка на Румен Петков. И преди, и след това от ДАНС не изглежда да правят нещо. Въпреки

При такива служби кой би разследвал евентуална намеса на руското разузнаване в български избори?

Как се разширява периферия?

За да има шанс евентуален демократичен кандидат да спечели президентските избори, е необходима не само подходяща личност, която да увлича хората. Трябва да съществува и важна кауза, която да насочва обществената енергия в определена посока.

Междукоалиционните дрязги дали кандидатът да е по-десен, или по-либерален, дали да е утвърден политик, или надпартийно лице и т.н., изпускат от поглед голямата картина. А тя е, че който и да е кандидатът, той (или тя, макар да е малко вероятно родните демократи да рискуват да предложат жена) трудно ще намери какво да каже или да направи, така че да привлече избиратели, надхвърлящи електоралното ядро на ПП–ДБ в пъти.

Евроскептичните гласоподаватели трудно могат да бъдат привлечени с лозунги против корупцията след т.нар. сглобка – правителството на Николай Денков с участието на ГЕРБ. И с натрапената подкрепа на Делян Пеевски, който използва ПП–ДБ, за да „изсветли“ имиджа си.

Приказките за борба с корупцията и за правосъдна реформа

не могат да трогнат и електората на ГЕРБ и ДПС, както и голяма част от хората, които не гласуват, защото не виждат смисъл.

Представете си ресторантьорка от малък български град. Тя знае, че за да върви бизнесът ѝ без проблеми, трябва да е в добри отношения с кмета и с местните мутри, да им дава каквото и колкото поискат, и да гласува за когото ѝ кажат. През целия ѝ живот тя и всички около нея живеят по този начин. За нея говоренето за корупция и правосъдие е празни приказки.

А сега си представете международен шофьор на камион. Официално той получава минимална работна заплата, но понеже работодателите му пестят от осигуровки, под масата взема няколко хиляди лева на месец. Човекът се оставя да го експлоатират и се притеснява, че ако бизнесът на превозвачите се изсветли, самият той ще обеднее. И не гласува, защото смята, че държавата не го защитава по никакъв начин.

С какви послания евентуалният кандидат на ПП–ДБ би могъл да привлече поне няколкостотин хиляди избиратели като посочените в примерите?

Посланията

Логично е един демократичен и проевропейски претендент за държавен глава да отправя демократични и проевропейски послания. Проблемът е, че българските демократични и проевропейски партии почти не отправят такива.

И тук дори не става дума за теми като човешки права, от които в България комай всички политически сили бягат като дявол от тамян. А за елементарни неща – например, че демокрацията е за предпочитане пред диктатурата. Или че България по конституция е светска държава. Или защо е важно да има неправителствени организации и изобщо гражданско общество.

Впрочем сещате ли се за българска политическа позиция през последните години, в която да се казва какви точно са европейските ценности?

Когато демократичните и проевропейските партии не отправят демократични и проевропейски послания, дневният ред се задава от популистите. Затова демокрацията се асоциира с правото всякакви въпроси да могат да се решават на референдум, НПО-тата се заклеймяват като „соросоиди“, неодобрението за въвеждането на еврото е толкова високо, европейските ценности се възприемат като „еврогейски“ и т.н.

И ако демократичният и проевропейски кандидат-президент тепърва започне да отправя демократични и проевропейски послания, едва ли ще има много хора, които ще искат да ги чуят.

Затова най-реалистичната стратегия е да се разчита на прагматизма.

Тоест да не се говори за принципи и ценности, а за ползи и рискове – какво рискуваме, ако завием към Русия, и какво печелим, ако сме интегрирани в ЕС.

Хубаво, но на това поле вероятно ще се състезават също ГЕРБ/ДПС. И ще се хвалят, че благодарение на тях България е запазила проевропейския си курс и е приела еврото. Което е фактологически вярно.

В такъв случай стратегията на евентуалния демократичен кандидат би могла да е да ги разобличава, показвайки тяхната неавтентичност. Например да припомня, че енергийната зависимост на България от Русия беше практически бетонирана по време на правителствата на Борисов и че това се промени чак когато започна войната на Русия срещу Украйна. Че Борисов подари на Путин кученце. Че ГЕРБ не е направила нищо за ограничаването на руското влияние в България.

Може и да се припомня, че на председателя на ГЕРБ най-близки на сърцето са му евроскептици като Виктор Орбан и че самият той нерядко се държи като евроскептик. Както и че Пеевски мени геополитическите си позиции според ситуацията – в един момент той може да си припише заслуги пред Украйна, в следващия – да се обяви срещу изпращането на военна помощ там.

Но печеливша стратегия ли е това?

Дали на критична част от българските гласоподаватели изобщо ѝ пука кой е по-автентичен демократ и кой по-малко?

Големият проблем не е дали ПП–ДБ ще предложат общ кандидат, а как такъв кандидат би могъл да спечели изборите. След като с отказа си да отстояват демокрацията партиите, идентифициращи се като демократични, са допринесли самата демокрация вече да няма особено значение в България. А принадлежността към европейското семейство е изпразнена от ценностно съдържание.

Накрая на немногото демократични и проевропейски избиратели най-вероятно ще им остане да се надяват президентските избори да спечелят ГЕРБ/ДПС. Защото познатото корупционно и клонящо към авторитаризъм блато е за предпочитане пред резкия геополитически завой към Русия. Пък и блатото си го познаваме. Досега парламентът и редовните правителства успяваха в някаква степен да противодействат на пропутинската ориентация на Румен Радев. Но ако и новият президент е проруски настроен, това ще даде тласък на следващи парламентарни избори, на които е доста вероятно евроскептиците да постигнат мнозинство.

Такива са перспективите, освен ако не се случи някое събитие с огромна важност, което рязко да събуди демократичните инстинкти на населението. Поне засега такова не се очертава.

Podman Container Monitoring with Prometheus Exporter, part 2

Post Syndicated from Janis Eidaks original https://blog.zabbix.com/podman-container-monitoring-with-prometheus-exporter-part-2/30538/

In the first part of this post, we explored how to get data with HTTP agent from the Prometheus Podman exporter and use the same item data for the Podman pods Discovery rule as well as item and trigger prototypes. In part 2 of the same series, we’ll learn how to discover and monitor Podman containers.

Creating a template discovery rule

I will create another discovery rule for container discovery. This discovery rule is also based on the same item [Podman info] in the template – Podman containers by HTTP and Prometheus (you can check part one of this series to find out how to configure it). The parameters of the discovery rule are shown below. This discovery rule will allow us to discover the pod name and ID.

Template: Podman containers by HTTP and Prometheus

▲ Discovery rule
  ▪ Name:                   Container discovery
  ▪ Type                    Dependent item
  ▪ Key:                    training.containers.discovery
  ▪ Master item             Podman containers by HTTP and Prometheus: Podman info
  ▪ Delete lost resources  After 10d
  ▪ Disable lost resources Immediately
♯ Preprocessing
  ▪ Prometheus to JSON     podman_container_info
♦ LLD Macros
  ▪ {#CONTAINER.ID}        $.labels.id
  ▪ {#CONTAINER.NAME}      $.labels.name
Fig 1. Discovery rule: Container discovery
Fig 2. Discovery rule: Container discovery preprocessing tab
Fig 3. Discovery rule: Container discovery LLD macros tab

Next, different dependent item prototypes are created in this container discovery rule. As the Prometheus Podman exporter provides a lot of different metrics about the containers, I will create multiple such items: state, health, creation date, input/output network traffic information, and so on. So, check out what metrics can be acquired and use what is relevant for you.

You can also add a description of each item prototype. I am interested only in metrics with the discovered container ID macros, and I am not interested in what values are for the other fields, such as pod_id, pod_name, so I use ~”.*”, which matches any value. I will show the item prototype configuration screenshots of one of the item prototypes.

These item prototypes are similar to each other, with some minor differences, such as Prometheus patterns, or in some cases, with a different master item (item prototype as master item).

Fig 4. Discovery rule preprocessing step: Prometheus to JSON with pattern podman_container_info
Fig 5. Discovery rule LLD macros: assigning relevant JSONPATH to LLD macros

Creating a template discovery rule: Item prototypes

After the containers have been discovered, we have to create item prototypes. These prototypes will also be dependent item prototypes and will use the same item as the discovery rule: Podman info. Prometheus Podman exported returns a lot more metrics for the containers than it did for the pods.

You can get container metrics such as container health, state, creation date, disk read/write, memory usage, network usage, and more. In this blog post, I have added most of them, so check what metrics are relevant to your monitoring needs and start monitoring.

Fig 6. Low-level discovery rule and item prototypes based on the same item.

The screenshots of the item prototype is shown below.

Fig 7. Container state item prototype tab
Fig 8. Container state item prototype tag tab
Fig 9. Container state item prototype preprocessing tab

Remember, you can also test these item prototypes in the preprocessing step – just copy the Prometheus exporter data and set the relevant macro to value you want to check.

The configuration parameters of the item prototypes are shown below. There are a lot of metrics you can monitor, but remember to monitor what is relevant and necessary for you.

Template: Podman containers by HTTP and Prometheus; Discovery rule: Container discovery

● Item prototype #1
  ▪ Name: 		Container health: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.health[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (float)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 
♦ Tags (name:value) 	
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:health		
♯ Preprocessing
  ▪ Prometheus pattern 	podman_container_health{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #2
  ▪ Name: 		Container state: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.state[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (float)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		
♦ Tags (name:value)  			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:state		
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_state{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #3
  ▪ Name: 		Created at: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.created[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		unixtime
♦ Tags (name:value) 		
  ▪ Container:{#CONTAINER.NAME}	
  ▪Metric:created		
♯ Preprocessing
  ▪ Prometheus pattern 	podman_container_created_seconds{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #4
  ▪ Name: 		Disk read per second: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.disk.read[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags (name:value) 	
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:disk_read		
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_block_output_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value
  ▪ Change per second

● Item prototype #5
  ▪ Name: 		Disk write per second: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.disk.write[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags (name:value) 	
  ▪ Container:{#CONTAINER.NAME}	 
  ▪ Metric:disk_write		
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_block_input_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value
  ▪ Change per second

● Item prototype #6
  ▪ Name: 		Exit code: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.exit_code[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (float)
    ▪ Master item	Podman containers by HTTP and Prometheus: Podman info
▪ Units: 			
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:exit_code	
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_exit_code{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #7
  ▪ Name: 		Image tags: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.image.tags[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Character
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 			
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:tag
♯ Preprocessing
▪ Prometheus pattern podman_container_info{id="{#CONTAINER.ID}",image=~".*",name=~".*",pod_id=~".*",pod_name=~".*",ports=~".*"} label image
  ▪ Regular expression	\.*(\/.\w.*)	\1

● Item prototype #8
  ▪ Name: 		Memory usage: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.mem[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:mem		
♯ Preprocessing
  ▪ Prometheus pattern podman_container_mem_usage_bytes{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #9
  ▪ Name: 		Network input dropped: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.in.drop[{#CONTAINER.NAME}]
  ▪ Type of inf: Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		packets
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_in_drop		
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_input_dropped_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #10
  ▪ Name: 		Network input errors: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.in.errors[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_in_err		
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_input_errors_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #11
  ▪ Name: 		Network input total: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.in.total[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_in_tot
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_input_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #12
  ▪ Name: 		Network input per second: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.in.change[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (float)
  ▪ Master item		prototype - Network input total: [{#CONTAINER.NAME}] 
  ▪ Units: 		Bps
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_in_change
♯ Preprocessing
  ▪ Change per second

● Item prototype #13
  ▪ Name: 		Network output dropped: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.out.drop[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_out_drop	
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_output_dropped_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #14
  ▪ Name: 		Network output errors: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.out.errors[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_out_err	
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_output_errors_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #15
  ▪ Name: 		Network output total: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.out.total[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_out_tot	
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_net_output_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #16
  ▪ Name: 		Network output per second: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.net.out.change[{#CONTAINER.NAME}]
  ▪ Type of inf: 	Numeric (float)
  ▪ Master item		prototype - Network output total: [{#CONTAINER.NAME}]
  ▪ Units: 		Bps
♦ Tags 			 
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:net_out_change
♯ Preprocessing
  ▪ Name			Change per second

● Item prototype #17
  ▪ Name: 		Rootfs size: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.rootfs.size[{#CONTAINER.NAME}]
  ▪ Type of inf: Numeric (unsigned)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		B
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:rootfs
♯ Preprocessing
  ▪ Prometheus pattern	podman_container_rootfs_size_bytes{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

● Item prototype #18
  ▪ Name: 		Total system CPU time: [{#CONTAINER.NAME}]
  ▪ Type 		Dependent item
  ▪ Key: 		container.cpu.time
  ▪ Type of inf: 	Numeric (float)
  ▪ Master item		Podman containers by HTTP and Prometheus: Podman info
  ▪ Units: 		s
♦ Tags 			
  ▪ Container:{#CONTAINER.NAME}	
  ▪ Metric:sys_time
♯ Preprocessing
  ▪ Prometheus pattern: podman_container_cpu_system_seconds_total{id="{#CONTAINER.ID}",pod_id=~".*",pod_name=~".*"} value

Creating a template discovery rule: Trigger prototype

I have created a user macro {$CONTAINER.RUNNING.STATE} on the template with a value of 2, which corresponds to the containers running state. After that, create a trigger prototype to check if the container is in different state other than running.

Template: Podman containers by HTTP and Prometheus; Discovery rule: Container discovery

◘ Trigger prototypes
  ▪ Name:               Container [{#CONTAINER.NAME}] state has changed from running
  ▪ Severity:           Warning
  ▪ Expression:         last(/Podman containers by HTTP and Prometheus/container.state[{#CONTAINER.NAME}])<>{$CONTAINER.RUNNING.STATE}
  ▪ PROBLEM event generation mode: Single
  ▪ OK event closes: All problems

So, once all of this is done, and some container status changes from running and or pod status also changes from running, you will get a problem event.

Fig 10. Generated problem events when the podman pod and container change states.

Technically, I could also create a trigger for container health; however, as all of the received container values for me are -1 (meaning unknown) it makes little sense to make a trigger that will fire right away. You can also add additional item/trigger prototypes in the template. If everything is set up as expected, you should see something like the screenshot below after the LLD rule execution.

Fig 11. Example of the mysql-server container and zabbix pod item values.

Summary

Now, you can monitor both Podman pods and containers using both blog posts of this series. We used the same template item for both the container LLD and item prototypes from the first part of this post.

The post Podman Container Monitoring with Prometheus Exporter, part 2 appeared first on Zabbix Blog.

Celebrating 11 years of Project Galileo’s global impact

Post Syndicated from Jocelyn Woolbright original https://blog.cloudflare.com/celebrating-11-years-of-project-galileo-global-impact/

June 2025 marks the 11th anniversary of Project Galileo, Cloudflare’s initiative to provide free cybersecurity protection to vulnerable organizations working in the public interest around the world. From independent media and human rights groups to community activists, Project Galileo supports those often targeted for their essential work in human rights, civil society, and democracy building.

A lot has changed since we marked the 10th anniversary of Project Galileo. Yet, our commitment remains the same: help ensure that organizations doing critical work in human rights have access to the tools they need to stay online.  We believe that organizations, no matter where they are in the world, deserve reliable, accessible protection to continue their important work without disruption.

For our 11th anniversary, we’re excited to share several updates including:

  • An interactive Cloudflare Radar report providing insights into the cyber threats faced by at-risk public interest organizations protected under the project. 

  • An expanded commitment to digital rights in the Asia-Pacific region with two new Project Galileo partners.

  • New stories from organizations protected by Project Galileo working on the frontlines of civil society, human rights, and journalism from around the world.


Tracking and reporting on cyberattacks with the Project Galileo 11th anniversary Radar report 

To mark Project Galileo’s 11th anniversary, we’ve published a new Radar report that shares data on cyberattacks targeting organizations protected by the program. It provides insights into the types of threats these groups face, with the goal of better supporting researchers, civil society, and vulnerable groups by promoting the best cybersecurity practices. Key insights include:

  • Our data indicates a growing trend in DDoS attacks against these organizations, becoming more common than attempts to exploit traditional web application vulnerabilities.

  • Between May 1, 2024, to March 31, 2025, Cloudflare blocked 108.9 billion cyber threats against organizations protected under Project Galileo. This is an average of nearly 325.2 million cyber attacks per day over the 11-month period, and a 241% increase from our 2024 Radar report. 

  • Journalists and news organizations experienced the highest volume of attacks, with over 97 billion requests blocked as potential threats across 315 different organizations. The peak attack traffic was recorded on September 28, 2024. Ranked second was the Human Rights/Civil Society Organizations category, which saw 8.9 billion requests blocked, with peak attack activity occurring on October 8, 2024.

  • Cloudflare onboarded the Belarusian Investigative Center, an independent journalism organization, on September 27, 2024, while it was already under attack. A major application-layer DDoS attack followed on September 28, generating over 28 billion requests in a single day. 

  • Many of the targets were investigative journalism outlets operating in regions under government pressure (such as Russia and Belarus), as well as NGOs focused on combating racism and extremism, and defending workers’ rights.

  • Tech4Peace, a human rights organization focused on digital rights, was targeted by a 12-day attack beginning March 10, 2025, that delivered over 2.7 billion requests. The attack saw prolonged, lower-intensity attacks and short, high-intensity bursts. This deliberate variation in tactics reveals a coordinated approach, showing how attackers adapted their methods throughout the attack.

The full Radar report includes additional information on public interest organizations, human and civil rights groups, environmental organizations, and those involved in disaster and humanitarian relief. The dashboard also serves as a valuable resource for policymakers, researchers, and advocates working to protect public interest organizations worldwide.

Global partners are the key to Project Galileo’s continued growth

Partnerships are core to Project Galileo success. We rely on 56 trusted civil society organizations around the world to help us identify and support groups who could benefit from our protection. With our partners’ help, we’re expanding our reach to provide tools to communities that need protection the most. Today, we’re proud to welcome two new partners to Project Galileo who are championing digital rights, open technologies, and civil society in Asia and around the world. 


EngageMedia is a nonprofit organization that brings together advocacy, media, and technology to promote digital rights, open and secure technology, and social issue documentaries. Based in the Asia-Pacific region, EngageMedia collaborates with changemakers and grassroots communities to protect human rights, democracy, and the environment.

As part of our partnership, Cloudflare participated in a 2025 Tech Camp for Human Rights Defenders hosted by EngageMedia, which brought together around 40 activist-technologists from across Asia-Pacific. Among other things, the camp focused on building practical skills in digital safety and website resilience against online threats. Cloudflare presented on common attack vectors targeting nonprofits and human rights groups, such as DDoS attacks, phishing, and website defacement, and shared how Project Galileo helps organizations mitigate these risks. We also discussed how to better promote digital security tools to vulnerable groups. The camp was a valuable opportunity for us to listen and learn from organizations on the front lines, offering insights that continue to shape our approach to building effective, community-driven security solutions.


Founded in 2014 by leaders of Taiwan’s open tech communities, the Open Culture Foundation (OCF) supports efforts to protect digital rights, promote civic tech, and foster open collaboration between government, civil society, and the tech community. Through our partnership, we aim to support more than 34 local civil society organizations in Taiwan by providing training and workshops to help them manage their website infrastructure, address vulnerabilities such as DDoS attacks, and conduct ongoing research to tackle the security challenges these communities face.

Stories from the field  

We continue to be inspired by the amazing work and dedication of the organizations that participate in Project Galileo. Helping protect these organizations and allowing them to focus on their work is a fundamental part of helping build a better Internet. Here are some of their stories:

  • Fair Future Foundation (Indonesia): non-profit that provides health, education, and access to essential resources like clean water and electricity in ultra-rural Southeast Asia. 

  • Youth Initiative for Human Rights (Serbia): regional NGO network promoting human rights, youth activism, and reconciliation in the Balkans.

  • Belarusian Investigative Center (Belarus): media organization that conducts in-depth investigations into corruption, sanctions evasion, and disinformation in Belarus and neighboring regions. 

  • The Greenpeace Canada Education Fund (GCEF) (Canada): non-profit that conducts research, investigations, and public education on climate change, biodiversity, and environmental justice. 

  • Insight Crime (LATAM): nonprofit think tank and media organization that investigates and analyzes organized crime and citizen security in Latin America and the Caribbean. 

  • Diez.md (Moldova): youth-focused Moldovan news platform offering content in Romanian and Russian on topics like education, culture, social issues, election monitoring and news. 

  • Engage Media (APAC): nonprofit dedicated to defending digital rights and supporting advocates for human rights, democracy, and environmental sustainability across the Asia-Pacific. 

  • Pussy Riot (Europe): a global feminist art and activist collective using art, performance, and direct action to challenge authoritarianism and human rights violations. 

  • Immigrant Legal Resource Center (United States): nonprofit that works to advance immigrant rights by offering legal training, developing educational materials, advocating for fair policies, and supporting community-based organizations.

  • 5wf Foundation (Netherlands): wildlife conservation non-profit that supports front-line conservation teams globally by providing equipment to protect threatened species and ecosystems.

These case studies offer a window into the diverse, global nature of the threats these groups face and the vital role cybersecurity plays in enabling them to stay secure online. Check out their stories and more: cloudflare.com/project-galileo-case-studies/

Continuing our support of vulnerable groups around the world 

In 2025, many of our Project Galileo partners have faced significant funding cuts, affecting their operations and their ability to support communities, defend human rights, and champion democratic values. Ensuring continued support for those services, despite financial and logistical challenges, is more important than ever. We’re thankful to our civil society partners who continue to assist us in identifying groups that need our support. Together, we’re working toward a more secure, resilient, and open Internet for all. To learn more about Project Galileo and how it supports at-risk organizations worldwide, visit cloudflare.com/galileo.

Google has New NVIDIA RTX Pro 6000 8-GPU Instances with AMD EPYC 9005 CPUs

Post Syndicated from Cliff Robinson original https://www.servethehome.com/google-has-new-nvidia-rtx-pro-6000-8-gpu-instances-with-amd-epyc-9005-cpus/

The new Google G4 instances house eight NVIDIA RTX Pro 6000 GPUs, combining for 768GB of memory, along with two AMD EPYC Turin CPUs

The post Google has New NVIDIA RTX Pro 6000 8-GPU Instances with AMD EPYC 9005 CPUs appeared first on ServeTheHome.

[$] LWN.net Weekly Edition for June 12, 2025

Post Syndicated from corbet original https://lwn.net/Articles/1023924/

Inside this week’s LWN.net Weekly Edition:

  • Front: Nyxt; Cyber Resilience Act; Unwanted file descriptors; Core-dump API; 6.16 Merge window; Uniprocessor configurations; Smatch; FUSE zero-copy; iov_iter; Fedora documentation.
  • Briefs: Android tracking; /e/OS 3.0; FreeBSD laptops; Ubuntu X11 support; Netdev 0x19; OIN anniversary; Quotes; …
  • Announcements: Newsletters, conferences, security updates, patches, and more.

The complete stream processing journey on FlinkSQL

Post Syndicated from Grab Tech original https://engineering.grab.com/the-complete-stream-processing-journey-on-flinksql

Introduction

In the fast-paced world of data analytics, real-time processing has become a necessity. Modern businesses require insights not just quickly, but in real-time to make informed decisions and stay ahead of the competition. Apache Flink has emerged as a powerful tool in this domain, offering state-of-the-art stream processing capabilities. In this blog, we introduce our FlinkSQL interactive solution in accompanying productionising automation, and enhancing our users’ stream processing development journey.

Preface

Last year, we introduced Zeppelin notebooks for Flink, as detailed in our previous post Rethinking Stream Processing: Data Exploration in an effort to enhance data exploration for downstream data users. However, as our use cases evolved over time, we quickly hit a few technical roadblocks.

Zeppelin notebook source code is maintained by a community separate from Flink’s community. As of writing, the latest Flink version supported is 1.17, whilst the latest Flink is already on version 1.20. This discrepancy in version support hinders our Flink upgrading efforts.

Cluster start up time

Our design to spin up a Zeppelin cluster per user on demand invokes a cold start delay, taking roughly around 5 minutes for the notebook to be ready. This delay is not suitable for use cases that require quick insights from production data. We quickly noticed that the user uptake of this solution was not as high as we expected.

Integration challenges

Whilst Zeppelin notebooks were useful for serving individual developers, we experienced difficulty integrating it with other internal platforms. We designed Zeppelin to empower solo data explorers, but other internal platforms like dashboards or automated pipelines needed a way to aggregate data from Kafka and Zeppelin just couldn’t keep up. The notebook setup was too isolated, requiring a workaround to share insights or plug into existing tools. For instance, if a team wanted to pull aggregated real-time metrics into a monitoring system, they had to export data manually, which is far from seamless access that we aimed for.

Introducing FlinkSQL interactive

With those considerations in mind, we decided to swap out our Zeppelin cluster with a shared FlinkSQL gateway cluster. We simplified our solution by removing some features our notebooks offered, focusing only on features that promote data democratisation.

Figure 1: Shared FlinkSQL gateway architecture

We split our solution into 3 layers:

  • Compute layer
  • Integration layer
  • Query layer

Users first interact with our platform portal to submit queries for data from Kafka online store using SQL (1). Upon submission, our backend orchestrator then creates a session for the user (2) and submits the SQL query to our FlinkSQL gateway using their inbuilt REST API (3). The FlinkSQL gateway then packages the SQL query into a Flink job to be submitted to our Flink session cluster (4) before collating its results. The subsequent results would be polled from the query layer to be displayed back to the user.

Compute layer

With FlinkSQL gateway acting as the main compute engine for ad-hoc queries, it is now more straightforward to perform Flink version upgrades along with our solution, since the FlinkSQL gateway is packaged along with the main Flink distribution. We do not need to maintain Flink shims for each version as adapters between the Flink compute cluster and Zeppelin notebook cluster.

Another advantage of using the shared FlinkSQL gateway was the reduced cold start time for each ad-hoc queries. Since all users share the same FlinkSQL cluster instead of having their own Zeppelin cluster, there was no need to wait for cluster startup during initialisation of their sessions. This brought the lead time to the first results displayed down from 5 minutes to 1 minute. There was still lead time involved as the tool provisions task managers on an ad-hoc basis to balance availability of such developer tools and the associated cost.

Integration layer

The Integration layer serves as the glue between the user-facing query layer and the underlying compute layer, ensuring seamless communication and security across our ecosystem. With the shift to a shared FlinkSQL gateway, we recognised the need for an intermediary that could handle authentication, authorisation, orchestration, and integration with internal platforms – all while abstracting the complexities of Flink’s native REST API.

Figure 2: FlinkSQL gateway

The FlinkSQL gateway’s built-in REST API gets the job done for basic query submission, but it falls short in areas like session management, requiring multiple POST requests just to fetch results. To address this, we extended a custom control plane with its own set of REST APIs, layered on top of the gateway.

We then extend these sessions and integrate them to our inhouse authentication and authorisation platform. For each query made, the control plane authenticates the user, spins up lightweight sessions and manages the communication between the caller and the Flink Session Cluster. If you are interested, check out our previous blog post, An elegant platform, for more details on the above mentioned streaming platform and its control plane.

curl --location 'https://example.com/v1/flink/flinksql/interactive' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ...' \
--data '{
    "environment" : "prd",
    "namespace" : "flink",
    "sql" : "SELECT * FROM titanicstream"}'

Example API request for running a FlinkSQL query

The integration layer also caters to B2B needs via our Headless APIs. By exposing the endpoints, developers are able to integrate real-time processing into their own tools. To run a query, programs can simply make a POST request with the SQL query and an operation ID would be returned. This operation ID could then be used in subsequent GET requests to fetch the paginated results of the unbounded query. This setup is ideal for internal platforms that need to query Kafka data programmatically. By abstracting these complexities, it ensures that users, whether individual analysts or internal platforms—can tap into Kafka data without wrestling with Flink’s raw interfaces.

Query layer

We then proceed to pair our APIs developed with an Interactive UI to build a Query layer that serves both human workflows. This is where users meet our platform.

Figure 3: Flink query layer’s user flow

Through our platform portal, users land in a clean SQL editor. We used a Hive Metastore (HMS) catalog that translates Kafka topics into tables. Users don’t need to decode stream internals; they can jump straight into it by simply selecting a table to query on. Once a query is submitted, it is then handled by the integration layer which routes it through the control plane to the gateway. Results are then streamed back, appearing in the UI within one minute, a significant improvement from the five minute Zeppelin cold starts.

This all crystalises into the user flow demonstrated in Figure 3, where we can easily retrieve Titanic data from a Kafka stream with a short command:

SELECT COUNT(*) FROM titanicstream WHERE kafkaEventTime > NOW() - INTERVAL '1' HOUR.

This setup enables a few use cases for our teams, such as:

  • Fraud analysts using the real-time data to debug and spot patterns in fraudulent transactions.
  • Data scientists querying live signals to validate their prediction models.
  • Engineers validating the messages sent from their system to confirm they are properly structured and accurately delivered.

Productionising FlinkSQL

With data being democratised, we see more users building use cases around our online data store and utilising the above tools to build new stream processing pipelines expressed as SQL queries. To simplify the last step of the software development lifecycle of deployment, we have also developed a tool to create a configuration based stream processing pipeline, with the business logic expressed as a SQL statement.

Figure 4: Portal for FlinkSQL pipeline creation

We host connectors for users to connect to other platforms within Grab, such as Kafka and our internal feature stores. Users could simply use them off-the-shelf and configure according to their needs before deploying their stream processing pipeline.

Users would then proceed to submit their streaming logic as a SQL statement. In the example illustrated in the diagram, the logic expressed is a simple filter on a Kafka stream for sinking the filtered events into a separate Kafka stream.

Users have the ability to then define the parallelism and associated resources they want to run their Flink jobs with. Upon submission, the associated resources would be provisioned and the Flink pipeline would be automatically deployed. Behind the scenes, we manage the application JAR file that is being used to run the job that dynamically parses these configurations and translates them into a proper Flink job graph to be submitted to the Flink cluster.

Within 10 minutes, users would have completed deploying their stream processing pipeline to production.

Conclusion

With our full suite of solutions for low code development via FlinkSQL, from exploration and design, to development and then deployment, we have simplified the journey for developing business use cases off online streaming stores. By offering both a user-friendly interface for low-code users and a robust API for developers, these tools empower businesses to harness the full potential of real-time data processing. Whether you are a data analyst looking for quick insights or a developer integrating real-time analytics into your applications, our tools are able to lower the barrier of entry to utilising real-time data.

After we released these solutions, we quickly saw an uptick in pipelines created as well as the number of interactive queries fired. This result was encouraging and we hope that this would gradually bring upon a paradigm shift, enabling Grab to make data-driven operational decisions on real-time signals, empowering us with the ability to react to ever-changing market conditions in the most efficient manner.

Join us

Grab is a leading superapp in Southeast Asia, operating across the deliveries, mobility and digital financial services sectors. Serving over 800 cities in eight Southeast Asian countries, Grab enables millions of people everyday to order food or groceries, send packages, hail a ride or taxi, pay for online purchases or access services such as lending and insurance, all through a single app. Grab was founded in 2012 with the mission to drive Southeast Asia forward by creating economic empowerment for everyone. Grab strives to serve a triple bottom line – we aim to simultaneously deliver financial performance for our shareholders and have a positive social impact, which includes economic empowerment for millions of people in the region, while mitigating our environmental footprint.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Monitoring AWS End User Messaging SMS Registrations with Lambda

Post Syndicated from Patrick Viker original https://aws.amazon.com/blogs/messaging-and-targeting/monitoring-aws-end-user-messaging-sms-registrations-with-lambda/

Managing AWS End User Messaging SMS registrations can be challenging, especially when dealing with multiple registrations in various states and countries. This post introduces an automated monitoring solution that helps you stay on top of your registration statuses. By leveraging AWS Lambda, EventBridge, and Simple Email Service (SES), you’ll create a system that provides regular updates about registrations that need attention, are under review, in draft pending submission, or have recently been completed.

Whether you’re managing Sender IDs, United States 10DLC campaigns, toll-free numbers, or short codes, this solution will help you maintain visibility across all your registrations and respond promptly to status changes. The setup takes approximately 15-20 minutes and requires no ongoing maintenance beyond occasional adjustments to meet your evolving needs.

Estimated Setup Time: 15-20 minutes

Prerequisites

  • An AWS account with access to Lambda, IAM, EventBridge, End User Messaging SMS, and SES
  • A verified email address in Amazon SES for sending reports.
Figure 1: AWS End User Messaging SMS registrations in the console. This view shows registrations in various states that our Lambda function will monitor.

Figure 1: AWS End User Messaging SMS registrations in the console. This view shows registrations in various states that our Lambda function will monitor.

Step 1: Set up Amazon SES

  1. Open the Amazon SES console
  2. Navigate to “Verified identities”
  3. If you have an identity verified you can skip this section
  4. If you do not already have an identity verified Click “Create identity”
  5. Review this post to learn how to verify an identity
    NOTE: Best practice is to verify a domain identity. This will authenticate your domain and improve deliverability. An email address identity, while more simple, will not be authenticated through DKIM which may decrease deliverability.

Reference: Creating and verifying identities in Amazon SES

Step 2: Create an IAM Role

  1. Open the IAM console
  2. Navigate to “Roles” and click “Create role”
  3. Select “AWS service” and “Lambda” as the use case
  4. Add only the following AWS managed policy: AWSLambdaBasicExecutionRole
  5. Name the role (e.g., “EndUserMessagingRegistrationsMonitorRole”) and click “Create role”
  6. After role creation, click on the newly created role
  7. Select “Add permissions” → “Create inline policy”
  8. Click on the “JSON” tab and paste the following policy:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "sms-voice:DescribeRegistrations",
                    "sms-voice:DescribeRegistrationVersions"
                ],
                "Resource": "arn:aws:sms-voice:${region}:${account-id}:registration/*"
            },
            {
                "Effect": "Allow",
                "Action": "ses:SendRawEmail",
                "Resource": [
                    "arn:aws:ses:${region}:${account-id}:identity/${domain}",
                    "arn:aws:ses:${region}:${account-id}:configuration-set/*"
                ],
                "Condition": {
                    "StringLike": {
                        "ses:FromAddress": [
                            "*@${domain}"
                        ]
                    }
                }
            }
        ]
    }
  9. Replace the placeholders in the policy:
    1. ${region}: Your AWS region (e.g., ‘us-west-2’)
    2. ${account-id}: Your AWS account ID
    3. ${domain}: Your verified SES domain
  10. Click “Next”
  11. Name the policy (e.g., “EndUserMessagingRegistrationsAccess”) and click “Create policy”

Reference: Creating a role for an AWS service (console)

Step 3: Create the Lambda Function

  1. Open the Lambda console
  2. Click “Create function”
  3. Choose “Author from scratch”
  4. Configure basic settings:
  5. Name: EndUserMessagingRegistrationsMonitor
  6. Runtime: Python 3.12
  7. Architecture: x86_64
  8. Permissions: Use the IAM role created in Step 2
  9. Click “Create function”
  10. Configure environment variables:
    1. Under “Configuration” tab → “Environment variables”
    2. Set the following key/value pairs:
      1. Key: SENDER_EMAIL, Value: [Your verified SES email]
      2. Key: RECIPIENT_EMAIL, Value: [Email to receive reports]
  11. Configure function timeout:
    1. Under “Configuration” → “General configuration”
    2. Set timeout to 1 minute
  12. In the function code area, paste the following code:
    import boto3
    import json
    from datetime import datetime, timedelta
    from collections import defaultdict
    from email.mime.text import MIMEText
    from email.mime.multipart import MIMEMultipart
    import os
    
    # Constants
    REGION = os.environ['AWS_REGION']
    SENDER_EMAIL = os.environ['SENDER_EMAIL']
    RECIPIENT_EMAIL = os.environ['RECIPIENT_EMAIL']
    COMPLETED_LOOKBACK_DAYS = 7
    
    # Initialize AWS clients
    sms_client = boto3.client('pinpoint-sms-voice-v2', region_name=REGION)
    ses_client = boto3.client('ses', region_name=REGION)
    
    # Global registration dictionary
    registrations = {
        'REQUIRES_UPDATES': defaultdict(list),
        'CREATED': defaultdict(list),
        'COMPLETED': defaultdict(list),
        'REVIEWING': defaultdict(list)
    }
    
    def get_console_url(registration_id):
        return f"https://{REGION}.console.aws.amazon.com/sms-voice/home?region={REGION}#/registrations?registration-id={registration_id}"
    
    def get_version_details(registration_id, latest_denied_version=None):
        try:
            if latest_denied_version:
                response = sms_client.describe_registration_versions(
                    RegistrationId=registration_id,
                    VersionNumbers=[latest_denied_version]
                )
            else:
                response = sms_client.describe_registration_versions(
                    RegistrationId=registration_id,
                    MaxResults=1
                )
            
            if response['RegistrationVersions']:
                return response['RegistrationVersions'][0]
        except Exception as e:
            print(f"Error getting version details for {registration_id}: {str(e)}")
        return None
    
    def is_recently_completed(version_info):
        if 'RegistrationVersionStatusHistory' in version_info:
            history = version_info['RegistrationVersionStatusHistory']
            if 'ApprovedTimestamp' in history:
                approved_time = history['ApprovedTimestamp']
                if isinstance(approved_time, datetime):
                    approved_time = approved_time.timestamp()
                lookback_time = (datetime.now() - timedelta(days=COMPLETED_LOOKBACK_DAYS)).timestamp()
                return approved_time > lookback_time
        return False
    
    def categorize_registration_type(registration_type):
        if 'TEN_DLC' in registration_type:
            return 'TEN_DLC'
        elif 'LONG_CODE' in registration_type:
            return 'LONG_CODE'
        elif 'SHORT_CODE' in registration_type:
            return 'SHORT_CODE'
        elif 'SENDER_ID' in registration_type:
            return 'SENDER_ID'
        elif 'TOLL_FREE' in registration_type:
            return 'TOLL_FREE'
        else:
            return 'OTHER'
    
    def categorize_registrations():
        global registrations
        registrations = {
            'REQUIRES_UPDATES': defaultdict(list),
            'CREATED': defaultdict(list),
            'COMPLETED': defaultdict(list),
            'REVIEWING': defaultdict(list)
        }
    
        try:
            response = sms_client.describe_registrations()
            
            for registration in response.get('Registrations', []):
                status = registration['RegistrationStatus']
                registration_id = registration['RegistrationId']
                registration_type = registration['RegistrationType']
                category = categorize_registration_type(registration_type)
                
                reg_info = {
                    'id': registration_id,
                    'type': registration_type,
                    'status': status,
                    'version': registration['CurrentVersionNumber'],
                    'console_url': get_console_url(registration_id)
                }
    
                if 'AdditionalAttributes' in registration:
                    reg_info['additional_attributes'] = registration['AdditionalAttributes']
    
                if status == 'REQUIRES_UPDATES':
                    latest_denied_version = registration.get('LatestDeniedVersionNumber')
                    version_info = get_version_details(registration_id, latest_denied_version)
                    if version_info and 'DeniedReasons' in version_info:
                        reg_info['denial_reasons'] = version_info['DeniedReasons']
                    registrations['REQUIRES_UPDATES'][category].append(reg_info)
                
                elif status == 'CREATED':
                    registrations['CREATED'][category].append(reg_info)
                
                elif status == 'REVIEWING':
                    registrations['REVIEWING'][category].append(reg_info)
                
                elif status == 'COMPLETE':
                    version_info = get_version_details(registration_id)
                    if version_info and is_recently_completed(version_info):
                        approved_timestamp = version_info['RegistrationVersionStatusHistory']['ApprovedTimestamp']
                        if isinstance(approved_timestamp, datetime):
                            approved_timestamp = approved_timestamp.timestamp()
                        reg_info['approved_timestamp'] = approved_timestamp
                        registrations['COMPLETED'][category].append(reg_info)
                    
        except Exception as e:
            print(f"Error listing registrations: {str(e)}")
            raise e
    
    def generate_html_output():
        html = """
        <html>
        <head>
            <style>
                body { 
                    font-family: Arial, sans-serif;
                    margin: 20px;
                    line-height: 1.6;
                }
                .registration-group { margin: 20px 0; }
                .registration-category { 
                    background-color: #f0f0f0;
                    padding: 10px;
                    margin: 10px 0;
                    border-radius: 5px;
                }
                .registration-item {
                    border-left: 4px solid #ccc;
                    margin: 10px 0;
                    padding: 10px;
                    background-color: #ffffff;
                }
                .requires-updates { border-left-color: #ff9900; }
                .created { border-left-color: #007bff; }
                .completed { border-left-color: #28a745; }
                .reviewing { border-left-color: #6c757d; }
                .denial-reasons {
                    background-color: #fff3f3;
                    padding: 10px;
                    margin: 5px 0;
                    border-radius: 3px;
                }
                .console-link {
                    color: #007bff;
                    text-decoration: none;
                    padding: 2px 5px;
                    border: 1px solid #007bff;
                    border-radius: 3px;
                }
                .console-link:hover {
                    background-color: #007bff;
                    color: #ffffff;
                }
                .summary {
                    margin: 20px 0;
                    padding: 15px;
                    background-color: #e9ecef;
                    border-radius: 5px;
                    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
                }
                .divider {
                    border-top: 2px solid #dee2e6;
                    margin: 20px 0;
                }
                .lookback-info {
                    background-color: #e2e3e5;
                    padding: 10px;
                    border-radius: 5px;
                    margin: 10px 0;
                    font-style: italic;
                }
                h2, h3, h4 {
                    color: #333;
                    margin-top: 20px;
                }
                ul {
                    margin: 5px 0;
                    padding-left: 20px;
                }
                li {
                    margin: 5px 0;
                }
            </style>
        </head>
        <body>
        """
    
        html += f"<h2>Registration Status Report - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</h2>"
    
        html += f"""
        <div class="lookback-info">
            Note: Completed registrations shown are those completed within the last {COMPLETED_LOOKBACK_DAYS} days.
        </div>
        """
    
        html += '<div class="summary"><h3>Summary</h3>'
        for status, categories in registrations.items():
            total = sum(len(regs) for regs in categories.values())
            html += f"<p><strong>{status}:</strong> {total}"
            if status == 'COMPLETED':
                html += f" (last {COMPLETED_LOOKBACK_DAYS} days)"
            html += "<br>"
            for category, regs in categories.items():
                if regs:
                    html += f"&nbsp;&nbsp;{category}: {len(regs)}<br>"
            html += "</p>"
        grand_total = sum(sum(len(regs) for regs in categories.values()) for categories in registrations.values())
        html += f"<p><strong>Total Registrations:</strong> {grand_total}</p>"
        html += "</div>"
    
        html += '<div class="divider"></div>'
    
        html += "<h3>Detailed Registration Status</h3>"
    
        for status, categories in registrations.items():
            total = sum(len(regs) for regs in categories.values())
            html += f"""
            <div class="registration-group">
                <h3>{status} Registrations (Total: {total})</h3>
            """
    
            for category, regs in categories.items():
                if regs:
                    html += f"""
                    <div class="registration-category">
                        <h4>{category} Registrations ({len(regs)})</h4>
                    """
    
                    for reg in regs:
                        css_class = status.lower().replace('_', '-')
                        html += f"""
                        <div class="registration-item {css_class}">
                            <strong>Registration ID:</strong> {reg['id']}<br>
                            <strong>Type:</strong> {reg['type']}<br>
                            <strong>Status:</strong> {reg['status']}<br>
                            <strong>Version:</strong> {reg['version']}<br>
                            <strong>Console:</strong> <a href="{reg['console_url']}" class="console-link" target="_blank">Open in Console</a><br>
                        """
    
                        if (reg['type'] == 'US_TEN_DLC_BRAND_VETTING' and 
                            status == 'COMPLETED' and 
                            'additional_attributes' in reg and 
                            'VETTING_SCORE' in reg['additional_attributes']):
                            html += f"<strong>Vetting Score:</strong> {reg['additional_attributes']['VETTING_SCORE']}<br>"
    
                        if status == 'REQUIRES_UPDATES' and reg.get('denial_reasons'):
                            html += '<div class="denial-reasons"><strong>Denial Reasons:</strong><ul>'
                            for reason in reg['denial_reasons']:
                                html += f"""
                                <li>
                                    <strong>{reason.get('Reason', 'N/A')}</strong><br>
                                    {reason.get('ShortDescription', 'N/A')}<br>
                                """
                                if reason.get('LongDescription'):
                                    html += f"{reason['LongDescription']}<br>"
                                if reason.get('DocumentationLink'):
                                    html += f'<a href="{reason["DocumentationLink"]}" target="_blank">Documentation</a>'
                                html += "</li>"
                            html += "</ul></div>"
    
                        if status == 'COMPLETED':
                            approved_time = datetime.fromtimestamp(reg['approved_timestamp']).strftime('%Y-%m-%d %H:%M:%S')
                            html += f"<strong>Approved:</strong> {approved_time}<br>"
    
                        html += "</div>"
                    html += "</div>"
            html += "</div>"
    
        html += "</body></html>"
        return html
    
    def send_email(html_content, subject, recipient):
        msg = MIMEMultipart('mixed')
        msg['Subject'] = subject
        msg['From'] = SENDER_EMAIL
        msg['To'] = recipient
    
        html_part = MIMEText(html_content, 'html')
        msg.attach(html_part)
    
        try:
            response = ses_client.send_raw_email(
                Source=msg['From'],
                Destinations=[recipient],
                RawMessage={'Data': msg.as_string()}
            )
            print(f"Email sent! Message ID: {response['MessageId']}")
        except Exception as e:
            print(f"Error sending email: {str(e)}")
            raise e
    
    def lambda_handler(event, context):
        try:
            print("Starting registration categorization...")
            categorize_registrations()
            
            print("Generating HTML report...")
            html_content = generate_html_output()
            
            print("Sending email...")
            subject = f"End User Messaging SMS Registration Status Report - {datetime.now().strftime('%Y-%m-%d %H:%M')}"
            send_email(html_content, subject, RECIPIENT_EMAIL)
            
            return {'statusCode': 200, 'body': json.dumps('Registration status report generated and sent successfully')}
        except Exception as e:
            print(f"Error in lambda execution: {str(e)}")
            return {'statusCode': 500, 'body': json.dumps(f'Error generating report: {str(e)}')}

Reference: Building Lambda functions with Python

Step 4: Set Up EventBridge Rule

  1. Open the Amazon EventBridge console
  2. Click “Create rule”
  3. Name your rule (e.g., “EndUserMessagingRegistrationsMonitorSchedule”)
  4. For the event pattern, choose “Schedule” then select “Continue in EventBridge Scheduler”
  5. Configure your schedule pattern to your requirements (for example, Cron-based schedule to run at specific days and times or rate-based schedule such as every 1 day). Click “Next”.
  6. Select “Lambda” for “Target detail” and select the Lambda function created in Step 3 as the target from the dropdown. Click “Next”.
  7. For permissions:
    1. Select “Create new role for this schedule”
    2. EventBridge will automatically create a role with the necessary permissions to invoke your Lambda function
  8. Click “Next” to review your configuration
  9. Click “Create schedule”

Reference: Amazon EventBridge Scheduler

Step 5: Test the Setup

  1. Open the Lambda console and navigate to your function
  2. Click “Test” and create a test event (you can use an empty JSON object {})
  3. Run the test and check the execution results
  4. Verify that you receive an email report

Reference: Testing Lambda functions in the console

The generated email report provides a clear summary of all registrations, categorized by their status.

Figure 2: The generated email report provides a clear summary of all registrations, categorized by their status.

Understanding the Lambda Function

Let’s break down the key components of our Lambda function:

  1. Initialization: The script sets up necessary AWS clients and defines constants.
  2. categorize_registrations(): This function fetches all registrations and categorizes them based on their status and type.
  3. generate_html_output(): Creates a formatted HTML report of the registration statuses.
  4. send_email(): Uses Amazon SES to send the HTML report via email.
  5. lambda_handler(): The main entry point for the Lambda function, orchestrating the entire process.

The function categorizes registrations into four main statuses:

  • REQUIRES_UPDATES: Registrations that need attention or modifications
  • CREATED: Newly created registrations
  • REVIEWING: Registrations currently under review
  • COMPLETED: Registrations that have been approved recently (within the last 7 days by default)
Detailed view of a registration requiring updates and created registrations pending submission, including specific denial reasons if applicable and direct console links.

Figure 3: Detailed view of a registration requiring updates and created registrations pending submission, including specific denial reasons if applicable and direct console links.

Customization Options

  1. Lookback Period: Modify the COMPLETED_LOOKBACK_DAYS constant to change how far back the function checks for completed registrations.
  2. Email Formatting: Adjust the HTML and CSS in generate_html_output() to customize the email report’s appearance.
  3. Additional Data: Modify the reg_info dictionary in categorize_registrations() to include more data fields in your report.

Monitoring and Maintenance

  1. CloudWatch Logs: Regularly check the Lambda function’s CloudWatch Logs for any errors or unexpected behavior.
  2. Adjusting Schedule: If you find the current schedule doesn’t meet your needs, adjust the EventBridge rule accordingly.

Conclusion

You now have an automated system that monitors your End User Messaging SMS registrations and sends you regular, detailed status reports. This setup provides:

  • Automated visibility into registrations requiring updates or attention
  • Clear tracking of draft registrations awaiting your submission
  • Monitoring of registrations under review
  • Notifications of recently completed registrations
  • A consolidated view of all registration states through formatted email reports

This automated solution eliminates the need for manual status checking and helps ensure timely responses to registration changes. As your messaging needs grow, you can easily customize the monitoring frequency, lookback period, and report format to match your requirements.

Next Steps:

  • Consider adjusting the EventBridge schedule based on your registration volume
  • Customize the email format to highlight information most relevant to your team
  • Set up CloudWatch alarms to monitor the Lambda function’s health
  • Review and update the completed registrations lookback period as needed

For more complex scenarios, consider extending this solution with additional features like Slack notifications, registration metrics tracking, or integration with your ticketing system.

 

AWS completes Police-Assured Secure Facilities (PASF) audit in Europe (London) AWS Region

Post Syndicated from Vishal Pabari original https://aws.amazon.com/blogs/security/aws-completes-police-assured-secure-facilities-pasf-audit-in-europe-london-aws-region/

We’re excited to announce that our Europe (London) AWS Region has renewed its accreditation for United Kingdom (UK) Police-Assured Secure Facilities (PASF) for Official-Sensitive data. Since 2017, the Amazon Web Services (AWS) Europe (London) Region has been accredited under the PASF program. This demonstrates our continuous commitment to adhere to the heightened expectations of customers with UK law enforcement workloads. Our UK law enforcement customers who require PASF can continue to run their applications in the PASF-accredited Europe (London) Region in confidence.

The PASF is a long-established assurance process, used by UK law enforcement, as a method for assuring the security of facilities such as data centers or other locations that house critical business applications that process or hold police data. PASF consists of a control set of security requirements, an on-site inspection, and an audit interview with representatives of the facility.

The Police Digital Service (PDS) confirmed the accreditation renewal for AWS on May 27, 2025. A confirmation letter can be found on AWS Artifact. The UK police force and law enforcement organizations can also obtain confirmation of the compliance status of AWS through the Police Digital Service.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

Reach out to your AWS account team if you have questions or feedback about PASF compliance.

If you have feedback about this post, submit comments in the Comments section below.

Vishal Pabari

Vishal Pabari

Vishal is a Security Assurance Program Manager at AWS, based in London, UK. Vishal is responsible for third-party and customer audits, attestations, certifications, and assessments across EMEA. Vishal previously worked in risk and control, and technology in the financial services industry.

Tea Jioshvili

Tea Jioshvili

Tea is a Manager in AWS Security Assurance based in Berlin, Germany. She leads various third-party audit programs across Europe. She previously worked in security assurance and compliance, business continuity, and operational risk management in the financial industry for multiple years.

The collective thoughts of the interwebz