Implementing least privilege access for Amazon Bedrock

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/implementing-least-privilege-access-for-amazon-bedrock/

Generative AI applications often involve a combination of various services and features—such as Amazon Bedrock and large language models (LLMs)—to generate content and to access potentially confidential data. This combination requires strong identity and access management controls and is special in the sense that those controls need to be applied on various levels. In this blog post, you will review the scenarios and approaches where you can apply least privilege access to applications using Amazon Bedrock. To fully benefit from the guidance in this post, you need an understanding of AWS APIs, AWS Identity and Access Management (IAM) policies, and AWS security services.

Let’s start by defining the principle of least privilege (PoLP): The PoLP is a security concept that advises granting the minimal level of access—or permissions—necessary for users, programs, or systems to perform their tasks. The main idea is that the fewer permissions an entity has, the lower the risk of malicious or accidental damage. Applying the PoLP to your use of AWS serves two purposes:

  • Security: By limiting access, you reduce the potential impact of a security incident. If a user or service has minimal permissions, the scope for any damage can be significantly reduced.
  • Operational simplicity: Managing permissions can become complex if not properly managed and maintained. Applying the PoLP to your access controls early helps keep configurations as manageable as possible. Finally, there are regulatory frameworks that require separation of duty between roles and a documented strategy for access controls, which can be achieved in part by adhering to the PoLP.

Amazon Bedrock is a fully managed AWS service that makes high-performing foundation models (FMs) available through a single unified API. You use Amazon Bedrock through AWS APIs, which expose actions for the control plane and administration such as the configuration of Amazon Bedrock Guardrails and Amazon Bedrock Agents, in addition to data plane functional actions such as inference.

Generally, the path to using Amazon Bedrock for a production workload includes the following stages:

  • Model selection: Decide on the required features (Retrieval Augmented Generation (RAG), fine-tuning, and so on), evaluate and select a model, and approve a EULA if necessary.
  • Model adaptation: Prompt engineering, integration of Amazon Bedrock into the application, and addition of model customization if desired.
  • Model testing: Validate and test the solution.
  • Model operation: Deploy the solution and make it available. Monitor and operate the solution.

In the following sections, we go through each phase and outline how you can apply the PoLP.

Model selection

In this phase, you choose the features and models that are needed to fulfill your requirements and define how you will apply the PoLP. These can include, for example, model customization, Retrieval Augmented Generation (RAG) or the use of agents.

Security should be integrated into the design so that the defined controls can be implemented during the development phase. One approach to define the necessary security controls is threat modeling. Doing this exercise early in the process will simplify the upcoming phases. The results can be used later to decide on the required guardrails, potential changes to the architecture, and test cases.

In this phase, you will also decide how the solution should be deployed. Customers typically operate in a multi-account setup; therefore, the selection of target organizational units (OUs) and accounts is required. We recommend creating a new OU for generative AI applications. For details, see the deep-dive chapter on generative AI in the AWS Security Reference Architecture. We will talk later about service control policies (SCPs) and how they can be used to restrict permissions. The generative AI OU is a good place to enforce those guardrails.

Amazon Bedrock provides access to a variety of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. In this stage, you need to choose the models that you’ll use and approve them. With a third-party FM, approval might include accepting a EULA. You can limit identities and the models that they can subscribe to in order to follow compliance with EULAs that have been reviewed by your legal department. The following is an example of an SCP that allows account operators to enable all Anthropic FMs and a single Meta Llama FM.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAcceptingModelEULAs",
      "Effect": "Allow",
      "Action": [
        "aws-marketplace:Subscribe"
      ],
      "Resource": "*",
      "Condition": {
        "ForAnyValue:StringEquals": {
          "aws-marketplace:ProductId": [
            "c468b48a-84df-43a4-8c46-8870630108a7",
            "b0eb9475-3a2c-43d1-94d3-56756fd43737",
            "prod-6dw3qvchef7zy",
            "prod-m5ilt4siql27k",
            "prod-ozonys2hmmpeu",
            "prod-fm3feywmwerog",
            "prod-2c2yc2s3guhqy"
          ]
        }
      }
    },
    {
      "Sid": "AllowUnsubscribingFromModels",
      "Effect": "Allow",
      "Action": [
        "aws-marketplace:Unsubscribe",
        "aws-marketplace:ViewSubscriptions"
      ],
      "Resource": "*"
    }
  ]
}

While this approach works well if you’re only allowlisting actions, you might have highly privileged users that already have broad access to AWS Marketplace APIs. In such a case, you can follow a deny all except a few approach. Such a policy, using the same models as before, would look like the following example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAcceptingAllExceptCertainModelEULAs",
      "Effect": "Deny",
      "Action": [
        "aws-marketplace:Subscribe"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringNotEquals": {
          "aws-marketplace:ProductId": [
            "c468b48a-84df-43a4-8c46-8870630108a7",
            "b0eb9475-3a2c-43d1-94d3-56756fd43737",
            "prod-6dw3qvchef7zy",
            "prod-m5ilt4siql27k",
            "prod-ozonys2hmmpeu",
            "prod-fm3feywmwerog",
            "prod-2c2yc2s3guhqy"
          ]
        }
      }
    },
    {
      "Sid": "DenyUnsubscribingAllExceptCertainModels",
      "Effect": "Deny",
      "Action": [
        "aws-marketplace:Unsubscribe",
        "aws-marketplace:ViewSubscriptions"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringNotEquals": {
          "aws-marketplace:ProductId": [
            "c468b48a-84df-43a4-8c46-8870630108a7",
            "b0eb9475-3a2c-43d1-94d3-56756fd43737",
            "prod-6dw3qvchef7zy",
            "prod-m5ilt4siql27k",
            "prod-ozonys2hmmpeu",
            "prod-fm3feywmwerog",
            "prod-2c2yc2s3guhqy"
          ]
        }
      }
    }
  ]
}

You can find the required product IDs used in the condition in Grant IAM permissions to request access to Amazon Bedrock foundation models.

Model adaptation

In this phase, the solution is built—that is, code is written. This is mostly identical to traditional software development, however there are some areas specific to generative AI, such as prompt engineering, prompt guardrails, model monitoring, and agent design. In this post, we focus solely on the identity and access management aspects.

Adaptation is the phase where the detailed permission sets are created. Data perimeters can be used as a conceptual tool to define and implement guardrails. Because data perimeters are typically coarse grained, they aren’t sufficient to achieve the goal of the PoLP. However, in combination with fine-grained policies, they support a defense-in-depth approach. The following data perimeters exist:

  • Identity: Only trusted identities are allowed in my network, only trusted identities can access my resources.
  • Resource: My identities can access only trusted resources, only trusted resources can be accessed from my network.
  • Network: My identities can access resources only from expected networks, my resources can only be accessed from expected networks.

For applications that use Amazon Bedrock, you can use a virtual private cloud (VPC) network construct with Amazon Virtual Private Cloud (Amazon VPC) to host them. Doing so means that you can then use AWS PrivateLink to create VPC endpoints for both data and control plane APIs. Using PrivateLink to create endpoints, it’s possible to provide access to Amazon Bedrock for VPC-bound compute resources (such as Amazon Elastic Compute Cloud (Amazon EC2), or AWS Lambda) without the need for an internet gateway. In other words, you can deploy these resources entirely in private subnets. By using resource-based policies on these endpoints, you can restrict the principals, actions, resources, and conditions related to making API calls.

Let’s assume you have a VPC with an EC2 instance running in a private subnet hosting an application that uses Amazon Bedrock model invocations and have created an interface VPC endpoint to connect to the Amazon Bedrock data plane. The EC2 instance is configured to use an instance profile using the <rolename> IAM role and needs to be able to invoke a single Anthropic’s Claude Instant FM through an Amazon Bedrock InvokeModel API call. You can apply the PoLP to the containing VPC, and thus the EC2 instance, with a custom policy on the Amazon Bedrock interface VPC endpoint. To use the following policy in your own account, replace the default interface VPC endpoint policy with the following example, replacing <rolename> with the role you want to allow and <account-id> with your 12-digit account number.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowInvokingClaudeInstantV1Models",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/<rolename>"
      },
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1"
    }
  ]
}

Check out Security Objective 2: Implement a data perimeter using VPC endpoint policies for more information about this data perimeter approach.

You can define the allowed models that can be used in Amazon Bedrock directly in the policy. However, if you have multiple applications that use Amazon Bedrock, you might have to update multiple policies when a new model is allowed to be used. To complement the data perimeter approach, you can add an SCP to limit the models that can be used for inference. Because Amazon Bedrock is using a simple API (InvokeModel and Converse) for inference, a condition element in an IAM policy can be used to deny the use of unapproved models. Note that while the two policies (the SCP and the VPC endpoint policy) look similar, they work differently: VPC endpoint policies are enforced provided that the network path through PrivateLink is enforced; SCPs are applied to principals within the account or OU they’re attached to. Be extra careful if the calling identity resides outside of your account, because only the VPC endpoint policy will apply.

For example, imagine that you wanted to block the invocation of all Anthropic FMs across your organizations within AWS Organizations, in all AWS Regions. The following SCP example applied to the OUs or AWS accounts in scope would achieve that outcome:

{
  "Version": "2012-10-17",
  "Statement": {
    "Sid": "DenyInferenceForAnthropicModels",
    "Effect": "Deny",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream"
    ],
    "Resource": [
      "arn:aws:bedrock:*::foundation-model/anthropic.*"
    ]
  }
}

You can use the same pattern to access data that’s needed for your application, such as data residing in Amazon Simple Storage Service (Amazon S3).

Model customization

A solution built on Amazon Bedrock might include model customization. The common denominator of the different customization approaches is that they include data, which is assumed to be confidential and thus in-scope for applying the PoLP. Here, we take a scenario where data is stored in Amazon S3 and can be encrypted using a customer managed AWS Key Management Service (AWS KMS) key.

Measures can be taken on multiple levels, as conceptualized in data perimeters: network, identity, and resource. Amazon Bedrock model customization uses service roles, which allows you to apply fine-grained and least-privilege access end-to-end. These service roles will be assumed by the Amazon Bedrock service principal, so that it can execute actions on your behalf. To allow the Amazon Bedrock service principal to assume the role in your account, you need to attach a trust policy to the role.

Let’s imagine that you’re running an Amazon Bedrock customization job in the us-east-1 (N. Virginia) Region. Using the following trust policy example will allow only the Amazon Bedrock service principal to assume your role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowBedrockServicePrincipalUnderConditions",
      "Effect": "Allow",
      "Principal": {
        "Service": "bedrock.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<account-id>"
        },
        "ArnEquals": {
          "aws:SourceArn": "arn:aws:bedrock:us-east-1:<account-id>:model-customization-job/*"
        }
      }
    }
  ]
}

Make sure to replace <account-id> in the preceding example trust policy with your own 12-digit account number. The policy contains a condition that provides cross-service confused deputy prevention by adding the aws:SourceAccount condition. The confused deputy problem is a situation where an entity that doesn’t have permission to perform an action can coerce a more privileged entity to perform the action. In AWS, cross-service impersonation can result in the confused deputy problem. Cross-service impersonation can occur when one service (the calling service) calls another service (the called service). AWS provides tools to help you protect your data for all services with service principals that have been given access to resources in your account. Both the aws:SourceArn and aws:SourceAccount global condition context keys in the role’s trust policy limit the permissions that Amazon Bedrock gives another service (in the preceding case, to the customization job) to the resource. aws:SourceArn is the more restrictive approach here, because it defines the specific source of the assume request, and not just the AWS account.

You should provide only the permissions that are required to fulfill the model customization task. For example, imagine that you want to limit access to your training data, the validation data bucket, and the output bucket (where Amazon Bedrock will deliver output metrics). The following policy, attached to that same service role, provides only those permissions. Replace the <training-bucket> placeholder with the bucket name that contains your training data, <validation-bucket> with your validation bucket, and <output-bucket> with the bucket where you want to store metrics.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAccessToTrainingAndValidationBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<training-bucket>",
        "arn:aws:s3:::<training-bucket>/*",
        "arn:aws:s3:::<validation-bucket>",
        "arn:aws:s3:::<validation-bucket>/*"
      ]
    },
    {
      "Sid": "AllowAccessToOutputBucket",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<output-bucket>",
        "arn:aws:s3:::<output-bucket>/*"
      ]
    }
  ]
}

Complementing this approach, we recommend using a VPC for the model customization job to restrict access to the training data. Technically, this again involves a VPC endpoint resource policy because the network is using interface VPC endpoints to access your S3 bucket. This allows you to define another network control, specifically an S3 bucket policy that only allows access through a specific VPC endpoint. So, for the situation where you want to limit access for the customization job itself, you can apply a bucket policy such as the following example, replacing <training-bucket> with the bucket name that contains your training data, and <vpce-id> with the ID of the VPC endpoint that resides in your VPC:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AccessToSpecificVPCEOnly",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",      
      "Resource": [
        "arn:aws:s3:::<training-bucket>",
        "arn:aws:s3:::<training-bucket>/*"
      ],
      "Condition": {
        "StringNotEquals": {
          "aws:SourceVpce": "<vpce-id>"
        }
      }
    }
  ]
}

In addition, you would restrict the principals that can access your VPC endpoint and the actions they’re allowed to take in Amazon S3. For simplicity, we’re omitting an example policy here because it’s very similar to the one we have in the Amazon Bedrock invocation section earlier in this post.

If you need to enforce encryption in Amazon S3 using a customer managed AWS KMS key (SSE-KMS), you will need to do the following:

  • Update the bucket policy with a statement denying unencrypted content being uploaded.
  • Update the KMS key policy to allow the service role to decrypt and describe the key.

The next policy example should be added to the bucket policy and demonstrates how to deny unencrypted objects being added to an S3 bucket. Again, replace <training-bucket> with the name of the S3 bucket that contains your training data:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyObjectsThatAreNotSSEKMS",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::<training-bucket>/*",
      "Condition": {
        "Null": {
          "s3:x-amz-server-side-encryption-aws-kms-key-id": "true"
        }
      }
    }
  ]
}

Finally, in the KMS key policy, you need a statement similar to the following to allow the Amazon Bedrock service role access to the KMS key. Replace <account-id> with your 12-digit account number and <bedrock-service-role> with the role you created, which will be assumed by the Amazon Bedrock service principal. Make sure to only give the required access to decrypt data with the KMS key to the IAM role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowUseOfKeyByBedrockRole",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/<bedrock-service-role>"
      },
      "Action": [
        "kms:Decrypt",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    }
  ]
}

Amazon Bedrock can also encrypt a customized model with a customer managed KMS key. Amazon Bedrock uses KMS key grants to encrypt the customized model and to decrypt it later when you deploy it for inference. Therefore, you need to grant the same IAM role permissions to create KMS key grants in the KMS key policy. The KMS key you use for this purpose is typically different than the one you used to encrypt the training data to allow fine-grained permissions on both keys.

So, let’s imagine that you want to use two different roles to encrypt and decrypt the customized models. To allow the role that executes the model customization job to use your KMS key, you need to add the following policy statements to the KMS key policy, replacing <account-id> with your 12-digit account number, <region> with the Region where you run Amazon Bedrock, <bedrock-model-customization-role> with the role name you use to run the model customization job, and <invocation-role> with the name of the role you use for inference.

{
  "Version": "2012-10-17",
  "Id": "PermissionsCustomModelKey",
  "Statement": [
    {
      "Sid": "PermissionsEncryptCustomModel",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<account-id>:role/<bedrock-model-customization-role>"
        ]
      },
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey",
        "kms:DescribeKey",
        "kms:CreateGrant"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "kms:ViaService": [
            "bedrock.<region>.amazonaws.com"
          ]
        }
      }
    },
    {
      "Sid": "PermissionsDecryptModel",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<account-id>:role/<invocation-role>"
        ]
      },
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": "*",
      "Condition": {
        "StringLike": {
          "kms:ViaService": [
            "bedrock.<region>.amazonaws.com"
          ]
        }
      }
    }
  ]
}

By using KMS key grants, you can revoke the permissions you granted to the service role after the customization job is done, thus reducing the permissions to least privilege. Also, Amazon Bedrock uses secondary KMS key grants for model encryption, which means that they’re automatically retired as soon as the operation that Amazon Bedrock performs on behalf of the customer is completed. The Encryption of model customization jobs and artifacts describes in more detail how grants are used.

To completement these IAM policy guardrails, you can add network controls to reduce the scope of the permissions of the process. Because we focus on IAM policies in this post, we won’t go into details here but only mention how the process works.

When you start a model customization job, a model training job is triggered within the model deployment account. The training job takes a base model from its S3 bucket, then connects to the S3 bucket that holds the customization training data to start the customization. This can be done through your VPC, where you specify a VPC configuration such as subnets and security groups, and the training job places an elastic network interface (ENI) into that VPC as specified. A request to the S3 bucket to read the training data now adheres to whatever routing rules are present in the VPC for that ENI. The VPC routing and security group attached to the ENI can be used to limit networking access to the model customization job.

Amazon Bedrock Agents

Amazon Bedrock Agents offers the capability to build and configure autonomous agents for applications. You can find more information about Amazon Bedrock Agents in Automate tasks in your application using AI agents.

Using an Amazon Bedrock agent also provides certain security properties that are applied to an inference task. For example, at the time of writing, there is no IAM condition key for the bedrock:InvokeModel API to require an Amazon Bedrock guardrail being attached to that same call. However, you can require that inferences are invoked through a call to an agent that has specific Amazon Bedrock guardrails configured.

Let’s say that you want to create a role that explicitly is only allowed to invoke Amazon Bedrock models through a specific Amazon Bedrock agent. The following IAM principal permissions policy example implies that the Amazon Bedrock agent specified has approved Amazon Bedrock guardrails configured. Again replace <region>, <account-id>, <bedrock-agent-id>, and <bedrock-agent-alias-id> with the values of your Amazon Bedrock agent.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAgentInvocation",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeAgent"
      ],
      "Resource": "arn:aws:bedrock:<region>:<account-id>:agent-alias/<bedrock-agent-id>/<bedrock-agent-alias-id>"
    },
    {
      "Sid": "DenyDirectInvocation",
      "Effect": "Deny",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:CreateModelInvocationJob"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/*"
    }
  ]
}

Provided that the Amazon Bedrock agent is configured by a systems administrator or operator with an approved Amazon Bedrock guardrail, the principal with the preceding policy attached to it will be able to invoke it with a prompt, and won’t directly invoke an Amazon Bedrock model. This strategy for making sure that Amazon Bedrock guardrails are applied to all Amazon Bedrock invocations is currently not possible with the bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream APIs, because they don’t have a condition key to match an Amazon Bedrock guardrail to. In addition, denying bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream also denies the Converse APIs and StartAsyncInvoke APIs, so there’s no need to add these separately to the Deny statement.

Because this strategy verifies the use of specific Amazon Bedrock guardrails, you can also use it for the enforcement of specific prompts, IAM service roles, knowledge bases, prompt and completion content restrictions, KMS keys, and FMs in inference invocations. For this approach to be effective, you need to also limit the principals that can create and update Amazon Bedrock agent configurations. Again, this can be restricted using an IAM policy, which is attached to only specific principals.

The following is an example IAM policy statement that gives an attached IAM principal the ability to update the configuration of a specific Amazon Bedrock agent, replacing <region>, <account-id>, and <agent-id> with the Region, account and identifier, and agent identifier that you’re using. If you want this to apply to all agents, replace <agent-id> with an asterisk (*).

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowUpdatingBedrockAgents",
      "Effect": "Allow",
      "Action": [
        "bedrock:DisassociateAgentKnowledgeBase",
        "bedrock:GetAgent*",
        "bedrock:ListAgent*",
        "bedrock:PrepareAgent",
        "bedrock:TagResource",
        "bedrock:UntagResource",
        "bedrock:UpdateAgent*"
      ],
      "Resource": [
        "arn:aws:bedrock:<region>:<account-id>:agent/<agent-id>"
      ]
    }
  ]
}

Where agents aren’t suitable, users or applications performing inference against the Amazon Bedrock models will need permissions to call the bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, or bedrock:CreateModelInvocationJob actions. In these cases, it can be desirable to limit the target models, following an allowlisting approach. Also, such permissions would only be attached to roles or applications that need to use them.

The following is an example of such a policy that restricts invocation to Anthropic’s Claude Instant.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowInvokationOnAnthropicClaudeInstantV1",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:CreateModelInvocationJob"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-instant-v1"
    }
  ]
}

You can include detective or reactive controls using Amazon CloudWatch EventBridge rules to detect model invocations that don’t use appropriate Amazon Bedrock guardrails, but that’s outside the scope of this post.

Model testing

Testing is the last step before the solution is deployed. Types of tests include unit tests, integration tests, user acceptance tests, penetration testing, and more. In this phase, you can again verify that the permissions that were assigned are indeed least privilege.

Especially in functional tests where data is involved, it’s important to consider that the data used for testing might be confidential. This is typically true when no synthetic test data is generated for the testing process. Controls to restrict access to the data and logs that are produced and might contain pieces of this data need to be the same as you will apply in a production environment. That you are only testing the solution doesn’t automatically mean that data access controls aren’t needed.

As discussed earlier in this post, controls are activated not only on identities, but also on the network and on resources. All of these should be validated and their effectiveness confirmed in this phase. Tests include but aren’t limited to:

  • Validate that you can only perform allowed actions in Amazon Bedrock through VPC endpoints, and that actions that don’t use VPC endpoints are blocked.
  • Validate the effectiveness of the resource policies on VPC endpoints by making sure that they can only be used by authenticated and authorized principals.
  • When using knowledge bases, validate that only the Amazon Bedrock service principal can access them.
  • When using Amazon Bedrock guardrails, evaluate their effectiveness. Because of the nature of generative AI applications, the diversity in input and output data can be big. Therefore, make sure to test guardrails with a reasonably large number of prompts.
  • If model invocation logging is activated, validate that logs are correctly written and protected with IAM permissions and encryption.
  • Validate that only the required personnel can access these logs, because they might contain sensitive data. Consider automatically sanitizing and forwarding them to a new CloudWatch log group.
  • Validate that all relevant Amazon Bedrock API calls are being properly logged in AWS CloudTrail, and that you can effectively monitor and alert on any suspicious activity.
  • Make sure that sensitive information—such as prompts and responses—isn’t being stored in the CloudTrail logs or in any trace output.

The threat modelling that you have potentially created in the design phase can provide valuable inputs that you can use for security-related test cases.

Model operation

In this phase, the solution is finally deployed into production. Operators need Amazon Bedrock control plane permissions to provision and manage Amazon Bedrock resources such as agents, guardrails, prompt libraries, and knowledge bases. They should only get the control plane permissions to provision and manage the Amazon Bedrock features that are being used. These same operators should have access to invoke or configure the Amazon Bedrock service features or their resources (Amazon Bedrock Agents, Amazon Bedrock Guardrails, and prompt libraries) only through an authorized pipeline. This immutable infrastructure approach restricts human users from creating situations or configurations that would otherwise allow unapproved access to the data plane, untracked changes to the control plane, or potentially disruptive updates to your application.

Alternatively, to reduce the assigned permissions to the absolute minimum, automated deployments using pipelines and pipeline roles can be used. This will not only provide versioned infrastructure but also adheres to PoLP by not providing access to human identities.

After deployment, the solution is live and being accessed by real users. At this point, topics such as monitoring, logging, and incident response become relevant. While Amazon Bedrock by default doesn’t store inference or response data, it’s recommended that you activate logging of those elements to constantly verify the accuracy of your generative AI application.

By using this solution, you reduce access to a minimum in the following areas:

  • Logged prompts and responses
  • Data available through knowledge bases, RAG, or similar sources
  • The ability to change the infrastructure

Use multiple, dedicated, least privileged roles for each task. This helps reduce the permissions scope to a minimum. Also, because least privilege enforces using a specific role for a specific task, it reduces the risk of unintended changes by requiring the assumption of a specific role.

By following the AWS Security Reference Architecture, security monitoring data is consolidated in a central security account. This allows a comprehensive, central overview of the security posture of your infrastructure.

Logging sensitive information

An important operational aspect is logging potentially sensitive data that’s sent to or received from the LLM. While Amazon Bedrock doesn’t store prompts or responses, you can use model invocation logging to collect invocation logs, model input data, and model output data for all invocations in your AWS account used in Amazon Bedrock. Model invocation logging isn’t enabled by default. After it’s enabled, prompts, completions, or both for all invocations of all approved models are then logged to the configured log destination. Valid log destinations for prompt and completion logs are Amazon S3 and CloudWatch. When writing logs to these destinations, they can optionally be encrypted using a supplied KMS key.

The contents of these logs might contain sensitive information in the prompt provided by the user or the reply generated by the model. As such, access to these logs should be restricted to personnel and machine processes that require and are authorized to access this classification of data. There are strategies such as using Amazon Macie on the Amazon S3 logs and CloudWatch Logs data protection capabilities to detect, monitor, and redact this sensitive information from logs, but that’s outside the scope of this post.

Even with Amazon Bedrock guardrails in place, the contents of these logs contain the pre-guardrailed user input, and so you must assume that these prompt and completion logs contain sensitive information. In this case, best practice is to encrypt log data with a KMS key, apply a data protection policy to the log group, and define at least three IAM roles:

  1. BasicCompletionLogReviewer: An IAM role whose sole purpose is to access and review the redacted version of these logs.
  2. SensitiveDataCompletionLogReviewer: A restricted IAM role whose sole purpose is to access and review the unredacted version of these logs.
  3. CompletionLogAdmin: A restricted IAM role whose sole purpose is to create, view, and delete data protection policies that can send audit findings to Amazon S3 and CloudWatch destinations.

To allow reading log events in a specific log group, use a policy such as the following and attach it to the BasicCompletionLogReviewer role, replacing <region>, <account-id>, <log-group-name>, and <alias-name> with values that match your CloudWatch log group and the KMS key that encrypts it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadingMaskedLogStream",
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogStreams",
        "logs:GetLogEvents"
      ],
      "Resource": "arn:aws:logs:<region>:<account-id>:log-group:<log-group-name>:*"
    },
    {
      "Sid": "AllowDecryptOfLogEvents",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": "arn:aws:kms:<region>:<account-id>:alias/<alias-name>"
    }

  ]
}

With an active data protection policy in place, the preceding policy won’t allow access to the unredacted version of these logs. To allow access to the unredacted versions to the SensitiveDataCompletionLogReviewer role, you need to add an additional action, replacing <region>, <account-id>, <log-group-name>, and <alias-name> with values that match your CloudWatch log group and the KMS key that encrypts it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadingMaskedLogEvents",
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogStreams",
        "logs:GetLogEvents",
        "logs:Unmask"
      ],
      "Resource": "arn:aws:logs:<region>:<account-id>:log-group:<log-group-name>:*"
    },
    {
      "Sid": "AllowDecryptOfLogEvents",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": "arn:aws:kms:<region>:<account-id>:alias/<alias-name>"
    }
  ]
}

The policy for the CompletionLogAdmin role requires different permissions; the following sample policy allows a user to create, view, and delete data protection policies that can send audit findings to all three types of audit destinations. It doesn’t permit the user to view unmasked data. This policy will look like the following example, replacing <delivery-stream-id>, <bucket-name>, <log-group-name>, and <alias-name> with the values that match your setup. Note that this includes a statement that explicitly denies the attached role access to decrypt the logs with the configured KMS key, aligning with the PoLP:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowLogGroupsManagement1",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogDelivery",
        "logs:PutResourcePolicy",
        "logs:DescribeLogGroups",
        "logs:DescribeResourcePolicies"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AllowLogGroupsManagement2",
      "Effect": "Allow",
      "Action": [
        "logs:GetDataProtectionPolicy",
        "logs:DeleteDataProtectionPolicy",
        "logs:PutDataProtectionPolicy",
        "s3:PutBucketPolicy",
        "firehose:TagDeliveryStream",
        "s3:GetBucketPolicy"
      ],
      "Resource": [
        "arn:aws:firehose:::deliverystream/<delivery-stream-id>",
        "arn:aws:s3:::<bucket-name>",
        "arn:aws:logs:::log-group:<log-group-name>:*"
      ]
    },
    {
      "Sid": "AllowListKMSKeys",
      "Effect": "Allow",
      "Action": [
        "kms:ListKeys",
        "kms:ListAliases"
      ],
      "Resource": "*"
    },
    {
      "Sid": "DenyDecryptOfLogEvents",
      "Effect": "Deny",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": "arn:aws:kms:<region>:<account-id>:alias/<alias-name>"
    }

  ]
}

This approach helps to avoid inadvertent access to or exposure of this sensitive data source and upholds the PoLP by separating duties.

Review IAM permissions on a periodic basis

Managing permissions is an ongoing effort because requirements and functionality change over time. Therefore, we recommend regularly reviewing the assigned permissions and verifying that they aren’t overly permissive. For example, if you have a Lambda function that makes API calls to Amazon Bedrock, and changes are made to that function that require additional permissions (perhaps the use of a new model), then it’s acceptable to update the policy attached to the IAM role that the function uses. It’s not always obvious that permissions for using the earlier model are still needed in the same policy; or permissions might be widened unnecessarily to include all models. When applying the PoLP, it’s important that policies be tested at the time they’re deployed to make sure that they meet the exact application needs and no more, but also that the presumed needs are reviewed periodically.

Using AWS IAM Access Analyzer, you can review and simulate proposed changes to IAM policies to ensure their suitability for a given application or function. You can also use IAM Access Analyzer to review unused permissions over time. This gives system operators an opportunity to inspect and then inform the removal of unused permissions in policies used with Amazon Bedrock applications. Remember that some permissions are dormant and ready for periodic use, such as incident response, recovery and other rare use cases, so your review shouldn’t assume that an unused permission is unnecessary, but an opportunity to review the need for the permission.

Finally, align monitoring of new Amazon Bedrock APIs with your IAM strategy. Especially when using denylisting approaches, it’s important to consider that services will announce new APIs, capabilities, and FMs over time. An example for this was the announcement of the new Converse API. This API provides functionality similar to Invoke, but in a consistent and thus simpler way. Considering such changes is therefore an integral part of your regular policy review processes.

Strong identity and access management is a journey, not a one-time action.

Conclusion

In this post, we have demonstrated some ways that you can apply the principle of least privilege (PoLP) to large language model (LLM)-based applications that use Amazon Bedrock. We have discussed the security considerations of each phase of the development lifecycle and provided examples that you can use as a starting point to implement your own PoLP strategy. It’s important that security doesn’t start late in the process; think about risks and the required actions as early as possible to make sure that your strategy is effective when your application goes live.

Finally, remember that the field of generative AI is moving quickly. We believe that it has the potential to transform virtually every customer experience. From a security perspective, this means that the threat landscape will change and evolve over time. Make sure to constantly adapt to new risks; evaluate and integrate them into your PoLP strategy.

Your AWS account team and specialists are happy to assist you on this journey.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Jonathan Jenkyn
Jonathan Jenkyn

Jonathan (“JJ”) Jenkyn is a Sr Security Assurance Solution Architect with AWS Security Assurance Services. With over 30 years of experience, he is a proven security leader who delivers robust cloud security outcomes. JJ is also an active member of the AWS People with Disabilities affinity group and enjoys running, cycling, and spending time with his family.
Michael Tschannen
Michael Tschannen

Michael is an EMEA Sr. Specialist SA Security with AWS, based in Switzerland. After working as a penetration tester early in his career, he decided to move to the defensive side. Since then, he’s been protecting customers and their data every day by making security and privacy an integral part of every solution he builds.

Create a serverless custom retry mechanism for stateless queue consumers

Post Syndicated from Kaizad Wadia original https://aws.amazon.com/blogs/architecture/create-a-serverless-custom-retry-mechanism-for-stateless-queue-consumers/

Serverless queue processors like AWS Lambda often exist in architectures where they pull messages from queues such as Amazon Simple Queue Service (Amazon SQS) and interact with downstream services or external APIs in a distributed architecture. Robust retry approaches are necessary to provide reliable message processing due to the susceptibility of these downstream services to short-term outages or throttling. This often requires implementing special retry logic with features like dead-letter queues (DLQs) and exponential backoff to handle these cases gracefully, making sure that the downstream systems don’t get overwhelmed by too many retries.

In this post, we propose a solution that handles serverless retries when the workflow’s state isn’t managed by an additional service.

Solution overview

Some custom retry logic is required when Lambda functions interact with downstream services after consuming messages from SQS queues. This strategy involves the usage of Amazon EventBridge Scheduler and code in Lambda. The core concept is to implement a robust retry mechanism for handling failed message processing attempts using an EventBridge scheduler. When a Lambda function encounters a problem while processing a message, it triggers a specific error. Upon catching this error in a catch block, the function generates an EventBridge schedule. As a result, the message is sent back to the SQS queue and will be available for processing again at a specified future time.

In this approach, the retry mechanism can have a fine-grained level of control over the retry timing that might also support various techniques, including exponential backoff and linear retry intervals. This approach separates the retry logic from the code to process the message itself, making the Lambda function performant. Along with handling messages when all retries are exhausted, this solution interfaces with a DLQ to keep such messages separate from the main queue.

The following diagram illustrates the solution architecture.

The error handling and retry choice logic in the Lambda function code form the basis for how this custom retry mechanism is implemented. If there is an error while processing the message, the function raises a specific exception. Raising the exception then initiates the retry flow. A try-catch block catches this exception and calls a function that interfaces with the EventBridge Scheduler API to build a custom schedule. To configure the schedule, we include the destination SQS queue and the intended timestamp when the message is meant to be retried. We can change the delay with some code modifications depending on a number of parameters, such as error type, number of prior retries, or other custom backoff schemes.

As part of this approach, we use SQS message attributes for idempotency and to track retries. On each retry, the function adds the new timestamp to an array in the message body. If the function consumes the message more times than the maximum retry limit (determined by the array of retry attempts) it sends the message to the DLQ without rescheduling.

The solution also involves the integration of a DLQ so that it doesn’t keep messages in the main processing queue and be retried forever. The Lambda function will register messages with the DLQ in case of either exceeding the maximum retry limit or when certain error scenarios require it to stop early. This queue keeps all communications that have failed until such a time they can be manually reviewed, reprocessed, or even corrected.

Considerations and best practices

There are a few key factors to keep in mind while putting this custom retry system into practice. One aspect is handling partial failures, that is, processing where only part of the steps are complete. In such cases, we could use some form of compensating action or rollback to maintain consistency in data and avoid discrepancies downstream of the queue consumer.

Another crucial factor is controlling retry limits. Although the system design allows for variable retry limits, we must balance resource usage and resilience. Too many retries might cause higher costs and lead to slowdowns or service degradation. That is why we recommend that appropriate retry limits are set, considering probable failure rates, SLAs, and business consequences of failures.

We must also consider that EventBridge Scheduler has a granularity of 1 minute, and there is additional latency between the queue and the function, so the mechanism will not be completely precise. In principle, the scheduler sets the minimum time before which the message can be processed, making sure the Lambda function adheres to the rate limits at a minimum. This could also result in additional delays, so the mechanism would need to be adjusted for time-sensitive applications to account for these delays.

Because the solution might deal with variable volumes of messages and processing loads, scaling issues are also important. For example, the Lambda concurrency and retention period for the queue represent resource configurations we should monitor and adjust for optimal performance and cost.

Finally, we need to consider security as part of the solution. If the downstream service runs in a virtual private cloud (VPC), we would also need to place the Lambda function in the VPC. In this case, we would need to access EventBridge Scheduler through AWS PrivateLink, which enables secure and performant access to services from within a VPC.

Additionally, it is important to implement the AWS Identity and Access Management (IAM) roles (mainly the Lambda function role) with the principal of least privilege, which gives it access to create the EventBridge schedule (and iam:PassRole to give the scheduler the required permissions) as well as pass the scheduler’s IAM role to it. The scheduler’s role only needs permission to place a message into the source queue. We also need to give the function access to place a message in the DLQ and receive messages from the source queue.

Monitoring and troubleshooting

The custom retry mechanism demands efficient monitoring and debugging. With that in mind, we might view various behaviors of the system and identify potential problems by using Amazon CloudWatch logs and metrics.

The number of invocations of Lambda functions, related error rates, runtimes, and use of DLQ are the key indicators that we should monitor. It would be worth setting up alarms in CloudWatch to send an alert or initiate automated actions when the Lambda function’s metrics surpass certain predetermined thresholds. By doing this, we proactively detect and resolve certain issues pertaining to the function.

Also, we can examine logs of the Lambda function for certain error situations, retry patterns, or problems with the downstream services or with the retry logic itself. We can place logging lines judiciously in the function code to record pertinent information, including message attributes, retry attempts, and error details.

Future enhancements

There are some improvements we could consider to enhance the capabilities and flexibility of the suggested approach even further, which provides a foundation to customize retry mechanisms.

A possible improvement would be to introduce dynamic retry intervals depending on the conditions of a downstream service or kinds of errors. Instead of being based on predefined backoff schemes, the system might dynamically adjust the retry intervals based on specific error types detected or in-service health monitoring in real time. This concept’s principal disadvantage is additional complexity, which might cause the failure of the retry process itself.

Another potential enhancement is the integration of the system with external configuration services such as Amazon DynamoDB or Parameter Store, a capability of AWS Systems Manager. That way, we can handle the retry configurations centrally and dynamically to provide ease of maintenance and modification in retry strategies without having to redeploy the Lambda function code.

It would also be possible to build in advanced error analysis and reporting into the system. The system would then have the potential to provide key insights for root cause analysis and proactive remediation through comprehensive reporting, patterns of errors analyzed, and failures correlated with downstream service health.

Conclusion

It is often challenging to build scalable, robust serverless applications that might need to talk with external services. However, the proposed solution using Lambda, Amazon SQS, and EventBridge Scheduler brings a simple yet effective solution to implement customized retry mechanisms. It gives the developer fine-grained control over the retry interval, supports scenarios such as exponential backoff, and works seamlessly with DLQs for persisting failures and EventBridge Scheduler for delayed retries of messages. The mechanism can also be reused more broadly for stateless queue consumers, not only for Lambda functions. This pattern enables developers to implement robust, fault-tolerant serverless systems that handle disruptions in downstream services gracefully.


About the Author

FOSDEM 2025 – на български

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3495

Случихме FOSDEM 2025.

Мисля, че това беше събитието, в което наляхме най-много усилия много хора. Получи се много добре (така де, с по-малко проблеми от очакваното)

Нещата започнаха още на предишното издание, където в NOC-а дойде да ни види Martijn Braam и да ни каже “беше ми скучно по едно време и тука си направих аудио миксер със сравнително проста платка и едно Teensy 4 за контролер”…
(пак от при него, нашата версия на миксера и ethernet switch-а)

И се започна нов дизайн на кутията (неофициално версия 2.5). Версия 2 беше една 4-портова мрежова карта на USB със switch в нея, една capture карта, базирана на MS2131 чип (с вграден loop-out) и един USB hub, на който да се закачат. Всичко останало се вършеше от един лаптоп, който ползвахме за encoder, да показва статус и всякакви такива неща.

Планът за новия дизайн беше:
– да махнем лаптопа
– да сложим екран на кутията
– да си спестим всякакъв друг хардуер, съответно
– да си вградим аудио миксер
– да имаме начин да си зареждаме батериите на микрофоните в движение
– (съвсем нагло, не успяхме) да вградим приемник за микрофоните в миксера

Съответно, Martijn и Ангел от наша страна се заеха с дизайн на платки.
(около това аз осъзнах колко мразя хардуера)

Не знам дали мога да разкажа всичко случило се в рамките на годината. Най-лесно е е да се види history-то на video repo-то.

Малко интересни моменти:
cursor.c, нещо малко, което като се preload-не, прави всяко ползване на SDL да крие курсора. Защото иначе ако човек прави с ffmpeg изход към HDMI порта на компютъра, поне в Debian остава един курсор в горния ляв ъгъл (написването и тестването на цялото нещо отне по-малко време от rebuild на ffmpeg, та за това е в такъв вид);
Инструкции за сглобяване, заедно с план за поточна линия/workshop за сглобяване на много кутии от много хора;

Martijn направи много хубаво видео за кутията, и с Ангел я представиха в една от лекциите на FOSDEM, записът вече е качен.

За основен компютър вътре в кутията избрахме Radxa x4. Изискванията ни бяха да е x86-64, да има Intel-ско GPU (защото за тях има mainline support за хардуерен encoding), и да е достъпна/налична. Има един много полезен сайт на Martijn, hackerboards.com, в който човек може да търси каквото компютърче му трябва. Първоначално тествахме с Raxda X2L, но заемаше доста място, и имаше проблем със захранването (оправен в по-новите версии, но все пак). В крайна сметка се оказа, че за подобни пари можем да вземем 110 X4-ки, с вграден eMMC flash и всякакви екстри (вкл. bluetooth/wifi, които не ползвахме), и се спряхме на тях.

Ще отделя малко повече време на историята на вентилаторите, понеже имаше много въпроси защо звучат така. Без да изпадам в подробности, които не разбирам, ситуацията е следната:
– на power board-а си имаме чип, който може да контролира вентилаторите, по документация или със стойност на PWM, или на база обороти;
– контролът на база обороти не работи. След като написах една реализация, която следваше дословно datasheet-а и не свърши никаква работа, отидох да видя source за същото нещо в linux kernel, и се оказа, че и там не го ползват на база обороти;
– поради някакви неща с навързването, чипът може и да поддържа един байт за PWM, но на практика има разлика само при 3-4, при всички останали или спира, или е на максимална скорост;
– Въпреки, че сме купили еднакви, хубави вентилатори, различните стойности на PWM-а имат различен ефект върху тях. Също така, различна температура/влажност в стаята, фаза на луната и т.н. също влияят различно. Което води до това, че понякога вентилаторите спират и трябва да им се вдигне стойността да развъртят, което води до виене;
– Което виене вероятно се чува на прилична част от лекциите;
– Понеже на дъното на кутията имаме 2 мм метална пластина, която го покрива цялото и я ползваме за радиатор, имаше разговор дали да не ги спрем изобщо (понеже по време на тестовете не успяхме да го прегреем), но не ни се рискуваше.

За съжаление, в много от лекциите ще се чува звукът на тези вентилатори. Ако някой измисли филтър да го махнем, да пише…

Металната пластина на дъното се оказа невероятно добра идея. Всичко дойде от там, че ако бяхме следвали стария принцип на закрепяне, трябваше да проектираме и 3d-принтираме много много различни и дребни части, които някакси да закрепим за малкото дупки на кутията многото платки. Идеята да направим една голяма платка отпадна много бързо, понеже две от платките изобщо не бяха правени от нас (Radxa x4 и HDMI capture картата), и щеше да затрудни много процеса на дебъгване и правене на нови ревизии. В един момент си казахме “а защо не просто едно желязо и да завиваме в него”, което почти като страничен ефект се оказа невероятно ефективен радиатор (преди него Radxa x4-ката прегряваше и забиваше в рамките на 10 минути, а Radxa X2L си смъкваше процесора на 500MHz, и обмисляхме всякакви варианти за охлаждане).

Октомври и ноември около други неща бяхме поприключили дизайна, и започна поръчването на всички части. Добрите хора от МиНоЛаб ни услужиха с място, където да струпаме всичко, да можем лесно да товарим и разтоварваме и да сглобяваме. Събрахме части за 70 кутии и организирахме sweatshop по Коледа, в който да ги сглобим, с допълнителна стъпка малко след нова година да инсталираме екраните.
(първоначалният план беше да сглобяваме в initLab, но щеше да ни е трудно да се съберем, и съвсем не ми се мисли как щяхме да носим кашоните с неща нагоре-надолу по ония стълби. За това в initLab беше повечето тестване и разни други разработнически дейности)

Не знам дали мога да обясня какво невероятно преживяване беше сглобяването. 10-15 човека, в дните между Коледа и нова година и после първия weekend на януари, от купчина части и джунджурии направихме 70 работещи, инсталирани и тествани кутии, с които преценихме, че може да се случи FOSDEM (60 за залите и 10 резервни). Ако се бяхме забавили с още една седмица, можеше да направим още, но твърде много приближаваше събитието, и щеше да е сложно да ползваме мястото за сглобяване.

В крайна сметка на 7ми януари успяхме да изпратим всичко към Белгия, и то стигна там след една седмица.
Това беше много на ръба, и бяхме на път да активираме резервния план (който беше да наемем камион и да си го караме ние сами до там, като за крайни случаи можеше да досглобяваме някакви от кутиите в по пътя)

За да се справим изобщо, махнахме следните функционалности:
– loop-out през портовете на Radxa-та, вместо през capture картата. Така щяхме и да можем да показваме нещо различно на екрана, докато няма нищо включено. Основната причина да го няма беше, че не можахме да намерим 110 бр. къси microHDMI-HDMI кабели;
– Да избегнем един кабел и да носим аудиото между двете кутии по мрежата. Подкарах един proof of concept с AES67, но не остана време да го направя production ready;
– Горното беше спряно и от това, че поради някакъв синхронизационен проблем audio mixer-a crash-ва, ако му ползваме USB audio-то. Има няколко идеи за как да го оправим, но сравнително късно намерихме, че причината е една промяна, която го прави да работи на 48KHz вместо на 44.1KHz, и води до разсинхронизация на USB-то и четене на грешна памет;
– Пак заради липса на един кабел и време не довършихме функционалността да може да се пуска звук от презентацията на лектора. В някои зали няма озвучаване, та това нямаше да е полезно, но за други щеше да е, а все повече лектори искат да пускат звук;
– Единият ни USB порт на кутията трябваше да ходи директно в Radxa-та, за да можем да правим разни интересни неща. И за него нямахме подходящ кабел, но и донякъде по-добре, понеже това си е жив backdoor в кутията и не сме доизмислили как някой да не пъхне клавиатура и да почне да прави мизерии…

Но, имахме цялата функционалност от предишни години, заедно с аудио миксер, който беше вграден (едно нещо по-малко за носене), и чиито нива можеше да следим в реално време и да променяме, когато се налага (без да трябва да пращаме някой на място). Като допълнителен бонус, не можеше random хора да променят настройките по звука, понеже кутията копчета няма :). Изобщо, аудио миксера и нещата около него заслужават отделен post, който Алберт (който написа повечето код) написа 🙂

В последната седмица успях да оправя още няколко неща, като да имаме и 480p stream (не само за хората на гаден DSL, но за хората на Wi-Fi в самия университет, които не успяваха да влязат в залите).

И така, започнахме FOSDEM 2025 с хардуер, приготвен преди 3 седмици, с известни проблеми и не-достатъчно-тестван. Ако имахме толкова проблеми, колкото очаквах, щях да използвам цялата тая история като пример за децата ми какво да не правят.
Отидохме около 30 човека от България, и бяхме голямата част на видео екипа. Въпреки всичките ми планове да съберем доброволци от други места, не ми се получи – успях да направя един online инструктаж точно една седмица преди FOSDEM, а нямах как по-рано (поради приключването на сглобяването в началото на януари, след което се разболях така, че все още кашлям). Надеждата ми е, след като сглобим останалите кутии, да видим дали разни hackerspace-ове няма да им харесат (щото с тях може да се правят доста интересни неща) и така да съберем още желаещи.

Имахме учудващо малко проблеми.

Основният ни проблем дойде от известния проблем с memory leak-овете на Voctomix – бяхме планирали да мигрираме към Voc2mix, който беше release-нат лятото, но не ни остана време да го преборим (той щеше да ни даде и още възможности, като например повече от 2 аудио канала и съответно начин да носим backup аудио). Това доведе до дупки в някои лекции, и за догодина има няколко идеи как да се оправи.

Другият по-често срещан проблем (3-4 пъти) беше, че определен входен сигнал в capture картата успява да я ошашка, и да и спре loop out-а. Не е ясно защо е, нямаме source на firmware и не можем да намерим документация на чипа (Macrosilicon MS2131, ако на някой му се намира, няма да откажа).
(има предположение, че определени macbook M1-та го правят това с побъркването на capture картата, накрая ще вземем да ги забраним)

Иначе ни отказаха една-две платки, спря тока в една зала, нищо не се наводни, въпреки забавяния заради вдигането на мрежата в петък бяхме готови в сравнително нормално време, та да си легнем навреме, и разглобяването приключи също достатъчно спокойно, че да може всички да хапнат нормално.

Успяхме дори да пренапишем начина за визуализация на нивата на звука (разделихме двата канала), с изнасяне на цялото смятане на нивата в отделна машина и база (за да освободим процесорно време на видео миксерите).

А аз направих грешката в събота вечер да ида на едната вечеря, която доста се проточи и заедно с умората от всичко останало доведе до там, че трябваше да седна и подремна някъде неделя следобед. Или съм изгубил форма, или просто не успях да се възстановя от боледуването…

Накрая дадохме назаем 2 кутии и един mixing laptop на FOSSASIA, има шанс да ги ползват скоро 🙂

От новите неща тази година – наслагахме стационарни телефони по VOC-овете и на разни други места, да видим как ще се получи. Закачихме и малко мобилни телефони на SIP към същата централа, за който желаеше. Експериментът беше успешен, ползваха се прилично (и вече има желаещи да има на още места). Може дори донякъде да заместят walkie-talkie-тата (дето още не мога да ги заобичам).

За самия FOSDEM не мога да кажа много – пак беше голяма лудница и пълно с хора, доколкото знам тази година доста по-добре са поддържали залите да не се препълнят. За мен най-интересното беше отделният “junior” track, за деца да ходят на workshop-и и да им разказват полезни неща. Това е нещо, което при нас май в момента го няма (едно време hackconf беше ориентиран към ученици, ама вече го няма), и би било хубаво да се появи.

И да завърша с една голяма благодарност към всички доброволци, които помогнаха и не ме удушиха (понеже цялото упражнение никак не беше леко) – без вас това нямаше шанс да се случи, та много много много благодаря. И ще се радвам да се включите пак догодина 🙂

FOSDEM 2025 – in English

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3494

We did FOSDEM 2025.

I think this was the event where we put the most effort into by the most people. It went very well (or, well, with less problems than expected)

Things started at the previous edition, where Martijn Braam came to see us at the NOC and said “I was bored at one point and I made an audio mixer with a relatively simple board and a Teensy 4 as a controller”…

(also from his blog, our version of the mixer and the Ethernet switch)

So a new design of the box began (unofficially version 2.5). Version 2 was a 4-port USB network card with a switch in it, a capture card based on the MS2131 chip (with built-in loop-out) and a USB hub to hook them up to. Everything else was done from a laptop that we used as an encoder, to show status and all that sort of stuff.

The plan for the new design was:
– to remove the laptop
– to put a screen on the box
– to save on any other hardware, respectively
– to build an audio mixer
– to have a way to charge the microphone batteries on the go
– (quite brazenly, bit it failed) to build a receiver for the microphones in the mixer

Accordingly, Martijn and Angel from our side started designing the boards.
(this is where I realized how much I hate hardware)

I don’t know if I can tell you everything that happened within the year. The easiest way is to look at the history of the video repo.

A few interesting points:
cursor.c, a small thing that, when preloaded, makes every use of SDL hide the cursor. Because otherwise, if you output to the HDMI port of the computer with ffmpeg, at least in Debian there is a cursor in the upper left corner (writing and testing the whole thing took less time than rebuilding ffmpeg, so that’s why it’s like this);
Assembly instructions, along with a plan for a production line/workshop for assembling many boxes by many people;

Martijn made a very nice video about the box, and with Angel they presented it in one of the FOSDEM talks, the recording is already uploaded.

For the main computer inside the box we chose a Radxa x4. Our requirements were that it be x86-64, have an Intel GPU (because they have mainline support for hardware encoding), and be affordable/available. There is a very useful site by Martijn, hackerboards.com, where you can search for any computer you need. Initially we tested with Raxda X2L, but it took up a lot of space, and there was a problem with the power supply (fixed in newer versions, but still). In the end, it turned out that for a similar amount of money we could get 110 X4s, with built-in eMMC flash and all sorts of extras (including bluetooth/wifi, which we didn’t use), and we settled on them.

I’ll spend a little more time on the history of the fans, because there were a lot of questions about why they sound like that. Without going into details that I don’t understand, the situation is as follows:
– on the power board we have a chip that can control the fans, according to the documentation either with a PWM value or based on RPM;
– the RPM-based control doesn’t work. After I wrote an implementation that followed the datasheet verbatim and didn’t do anything, I went to see the source for the same thing in the Linux kernel, and it turned out that they don’t use it based on RPM there either;
– due to some details with the wiring, the chip can support one byte for PWM, but in practice there is only a difference at 3-4, with all the others it either stalls or is at maximum speed;
– Although we bought the same, nice fans, different PWM values have different effects on them. Also, different temperature/humidity in the room, moon phase, etc. also have different effects. Which leads to the fact that sometimes the fans stop and their value has to be increased to spin, which leads to whining;
– Which whine is probably heard in a decent part of the lectures;
– Since we have a 2 mm metal plate on the bottom of the case that covers the whole case and we use it as a radiator, there was a discussion about whether to stop them altogether (because during the tests we were not able to overheat it), but we didn’t feel like taking the risk.

Unfortunately, the sound of these fans will be heard in many of the lectures. If someone comes up with a filter to remove it, please send it to us…

The metal plate on the bottom turned out to be an incredibly good idea. It all came from the fact that if we had followed the old principle of fastening, we would have had to design many different and small parts to be 3D-printed, which we can somehow attach to the few holes of the case so they can hold all the boards. The idea of making one big board was dropped very quickly, because two of the boards were not made by us at all (Radxa x4 and HDMI capture card), and it would have made the process of debugging and making new revisions very difficult. At one point we said to ourselves “why not just make one piece of iron and screw it in”, which almost as a side effect turned out to be an incredibly effective radiator (before that, the Radxa x4 would overheat and crash within 10 minutes, and the Radxa X2L would throttle its processor to 500MHz, and we were considering all sorts of cooling options).

In October and November, we had finished the design and started ordering all the parts. The good people at MinoLab provided us with a place to get all the pieces, so we could easily load and unload and assemble. We built 70 boxes out of parts we had by organizing a sweatshop at Christmas to assemble them, with an additional step shortly after the new year to install the screens.
(the original plan was to assemble at initLab, but it would have been difficult for us to fit there, and I can’t imagine how we would have carried the boxes of stuff up and down those stairs. That’s why most of the testing and various other development activities were at initLab)

I don’t know if I can explain what an incredible experience the assembly was. In the days between Christmas and New Year and then the first weekend of January, 10-15 people made 70 working, installed and tested boxes from a pile of parts and gadgets we had. We estimated that FOSDEM could happen with that many (60 for the rooms and 10 spare). If we had delayed another week, we could have built more, but the event was getting too close, and it would have been complicated to use the assembly site.

In the end, on January 7th we managed to send everything to Belgium, and it arrived there a week later.
This was very much on the verge, and we were about to activate the backup plan (which was to rent a truck and drive it there ourselves, and in extreme cases we could assemble some of the boxes on the way, in the back of the truck)

To be able to have something working, we removed the following functionalities:
– loop-out through the Radxa HDMI ports, instead of through the capture card. This way we could show something different on the screen while nothing was on. The main reason it wasn’t there was that we couldn’t find 110 pcs of short microHDMI-HDMI cables;
– To avoid one cable and carry the audio between the two boxes over the network. I ran a proof of concept with AES67, but there was no time left to make it production ready;
– The above was also stopped by the fact that due to some synchronization problem the audio mixer crashes if we use USB audio on it. There are several ideas on how to fix it, but relatively late we found that the cause was a change that makes it work at 48KHz instead of 44.1KHz, and that leads to USB desynchronization and reading the wrong memory;
– Again due to the lack of one cable and time we didn’t finish the functionality to be able to play sound from the lecturer’s presentation. Some rooms don’t have sound, so this wouldn’t be useful, but for others it would be, and more and more lecturers want to play sound these days;
– One of the USB ports on the box had to go directly to the Radxa, so we could do some interesting things. We didn’t have a suitable cable for it, but it was the better thing to do, because this is a live backdoor into the box and we haven’t figured out how to prevent someone from sticking a keyboard in and starting to make trouble…

But, we had all the functionality from previous years, along with an audio mixer that was built in (one less thing to carry), and whose levels we could monitor in real time and change when necessary (without having to send someone in the room). As an added bonus, random people couldn’t change the sound settings, because the box doesn’t have any controls :). In general, the audio mixer and things around it deserve a separate post, which Albert (who wrote most of the code) wrote 🙂

In the last week I managed to fix a few more things, like having a 480p stream (not only for people on crappy DSL, but for people on Wi-Fi at the university itself, who couldn’t get into the rooms).

So, we started FOSDEM 2025 with hardware that was prepared 3 weeks ago, with some problems and not-tested-enough. If we had as many problems as I expected, I would use this whole story as an example for my children of what not to do.
About 30 of us went from Bulgaria, and we were the majority of the video team. Despite all my plans to gather volunteers from other places, it didn’t work out – I managed to do an online briefing exactly one week before FOSDEM, and I couldn’t do it any earlier (due to the assembly finishing in early January, after which I got so sick that I’m still coughing). My hope is that after we assemble the rest of the boxes, we can see if various hackerspaces won’t like them (because you can do some interesting things with them) and that way we would gather more people who want to play with them and help at FOSDEM.

We had surprisingly few problems.

Our main problem came from the famous problem with Voctomix memory leaks – we had planned to migrate to Voc2mix, which was released in the summer, but we didn’t have time to test and integrate it (it would have given us more options, such as more than 2 audio channels and a way to carry backup audio). This led to holes in some lectures, and for next year there are a few ideas on how to fix it.

The other more common problem (3-4 times) was that a certain input signal in the capture card manages to screw it up and stop its loop out. It’s not clear why, we don’t have the firmware source and we can’t find documentation for the chip (Macrosilicon MS2131, if anyone has it, I won’t refuse).
(there is an assumption that certain macbook M1s do this with the capture card going crazy, eventually we will ban them)

Otherwise, a couple of boards died, the power went out in one room, nothing flooded, despite delays due to the network being deployed on Friday, we were ready at a relatively normal time, so we could go to bed on time, and the tear-down was also completed calmly enough so that everyone could eat normally.

We even managed to rewrite the way to visualize the sound levels (we separated the two channels), by moving the entire calculation of the levels to a separate machine and database (to free up processor time for the video mixers).

And I made the mistake on Saturday night of going to the staff dinner, which dragged on for a long time and, together with the fatigue from everything else, led to the fact that I had to sit down and take a nap somewhere on Sunday afternoon. Either I’ve lost my form, or I just haven’t been able to recover from my illness…

We also lent 2 boxes and a mixing laptop to FOSSASIA, there’s a chance they’ll use them soon 🙂

Some of the new things this year – we’ve been putting fixed phones on the VOCs and in various other places, to see if they’d help. We also hooked up some SIP mobile phones to the same PBX, for those who wanted them. The experiment was a success, they were used quite a bit (and there are already people who want them in more places). They might even replace walkie-talkies to some extent (which I still can’t get used to).

I can’t say much about FOSDEM itself – it was still very large and full of people, but as far as I know this year they’ve kept the rooms from getting crowded much better. For me, the most interesting thing was the separate “junior” track, for kids to go to workshops and be shown useful things. This is something that seems to be missing here in Bulgaria (hackconf used to be student-oriented, but it’s gone now), and it would be nice if it re-appeared.

And to end with a big thank you to all the volunteers who helped and didn’t strangle me (because the whole exercise wasn’t easy at all) – without you this wouldn’t have had a chance to happen, so thank you very much. And I’d be happy if you joined again next year 🙂

Building AI-powered customer experiences using a modern communications hub

Post Syndicated from Osman Duman original https://aws.amazon.com/blogs/messaging-and-targeting/building-ai-powered-customer-experiences-using-a-modern-communications-hub/

Customers demand organizations to anticipate and seamlessly fulfill their needs, engaging them with personalized content when, where, and how they prefer. They yearn for context-sensitive, dynamic interactions with nuanced conversations across all communication channels. Organizations are under growing pressure to modernize customer experience workflows to drive loyalty and improve operational efficiency. Leveraging the latest advancements in Generative AI (GenAI), such as hyper-personalization and Agentic AI, presents new challenges. Organizations require a scalable, reusable architecture to integrate GenAI into their customer engagement systems without a complete system overhaul, amid disparate solutions they currently operate.

This blog post explores how to build an AI-powered modern communications hub using open-source GitHub samples that integrate SMS/MMS and WhatsApp services with GenAI capabilities. Organizations can create innovative AI-powered customer experiences with a quick proof-of-concept without disrupting existing systems.

In combination with Vector Databases and Retrieval Augmented Generation (RAG), GenAI makes it possible to reorganize knowledge into a single system and query from a single user interface through natural language conversation with a chatbot or virtual assistant. Funneling customer communications through a multi-channel communications hub linked with GenAI capabilities helps unify customer engagement mechanisms and streamlines the creation of rich customer experiences. Customers meet AI agents and Q&A bots on the communication channel that is convenient to self-serve their needs. Organizations can build communications-channel-agnostic customer experiences while collecting channel engagement event and conversational data into a centralized data store for real-time insights, ad-hoc queries, analytics, and ML training.

Solution overview

In the core of the solution is the Modern Communications Hub that connects digital communication channels with key GenAI services, like Amazon Bedrock and Amazon Q, along with AWS ML, database, storage, and serverless computing services.AWS End User Messaging and Amazon SES provide API level access to digital communication channels, offering secure, scalable, high-performance, and cost-effective services for enterprise applications to exchange SMS/MMS, WhatsApp, push and voice notifications, and email with customers.

A collection of open-source sample code, published in the AWS-samples GitHub repository, illustrates how to facilitate generative conversations on SMS/MMS and WhatsApp channels. This will be extended to include email services. Two key components form the foundation of the GenAI Integration Samples: the Multi-channel Chat with AI Agents and Q&A Bots and the Engagement Database and Analytics for End User Messaging and SES. We will simply refer to these as the Conversation Processor and Engagement Database in the solution diagram.

This diagrams shows the solution architecture in Level 300

The Conversation Processor receives customer messages via AWS End User Messaging and Amazon Simple Email Service (SES), stores the conversation details, and invokes the relevant Amazon Bedrock Agent. Amazon Bedrock Agents use Large Language Models (LLMs) and knowledge bases to analyze tasks, break them into actionable steps, execute those steps or search the knowledge base, observe outcomes, and iteratively refine their approach until completing the task along with a response. Alternatively, the Conversation Processor can function as a Q&A bot in which case it uses Amazon Bedrock Knowledge Bases along with its RAG feature to generate an LLM answer and send back on the same channel as the customer’s message.

The Engagement Database collects and combines customer engagement data and conversational logs from across communication channels, storing the information in a centralized data lake on Amazon S3. By converting the data into a common, canonical format, the solution simplifies querying and analysis of these inbound events. A Lambda Transformer function leverages Apache Velocity Templates to transform the incoming JSON data, enabling real-time insights.

The raw event data stored in the Amazon S3 data lake can then be fed into other AWS services for further processing. For example, the data can flow into Amazon Connect Customer Data Profiles or Amazon SageMaker to support machine learning model training. Data analysts can use Amazon Athena to issue direct queries for detailed ad-hoc reporting, or to send the data to Amazon QuickSight for advanced visualizations and natural language querying capabilities through Amazon Q in QuickSight.

NOTE: There is the potential for end users to send Personal Identifiable Information (PII) in messages. To protect customer privacy, please consider using Amazon Comprehend to assist in redacting PII before storing messages in S3. The following blog post provides a good overview of how to use Comprehend to redact PII: Redact sensitive data from streaming data in near-real time using Amazon Comprehend and Amazon Kinesis Data Firehose.

Amazon Bedrock provides core GenAI capabilities such as LLMs, Knowledge Bases, Retrieval Augmented Generation (RAG), AI agents, and Guardrails, to understand customer asks, determine what action to take, and what to communicate back. Amazon Bedrock Knowledge Bases provide organization specific business knowledge and reasoning, while Amazon Bedrock Agents automate multistep tasks by seamlessly connecting with company systems, APIs, and data sources.

Prerequisites

The following prerequisites are necessary to build your modern communications hub:

  • An AWS account. Sign up for an AWS account at AWS website if you don’t have one.
  • Appropriate AWS Identity and Access Management(IAM) roles and permissions for Amazon Bedrock, AWS End User Messaging, and Amazon S3. For more information, see Create a service role for model import.
  • AWS End User Messaging Configuration: You’ll need to configure the necessary origination identity in the AWS End User Messagingservice to deliver messages via SMS or WhatsApp. If configuring SMS, a registered and active SMS Origination Phone Number must be provisioned in AWS End User Messaging SMS. (Within the United States, use 10DLC or Toll-Free Numbers (TFNs). If configuring WhatsApp, an active number that has been registered with Meta/WhatsApp should be provisioned in AWS End User Messaging Social.
  • Amazon Bedrock models: Bedrock Anthropic Claude 3.0 Sonnet and Titan Text Embeddings V2 enabled in your region. Note that these are the default models used by the solution, however, you are free to experiment with different models.
  • Docker Installed and Running – This is used locally to package resources for deployment.
  • Node (> v18) and NPM (> v8.19) installed and configured on your computer
  • The AWS Command Line Interface(AWS CLI) installed and configured
  • AWS CDK (v2) installed and configured on your computer.

Deploy the Conversation Processor and Engagement Database

Deploy the following two solutions. While not required, it is best to deploy them in this order, as outputs from the Engagement Database can be used in the Multi-Channel Chat example:

  1. Engagement Database and Analytics for End User Messaging and SES
  2. Multi-channel Chat with AI Agents and Q&A Bots

Each solution contains detailed instructions to deploy the required services using the AWS Cloud Development Kit (CDK). The first Engagement Database solution will create an Amazon Data Firehose stream that can be used as an input to the second Multi-Channel Chat application so that data can be stored and queried in the Engagement Database.

Multi-Channel Chat with AI Agents and Q&A Bot Data Sources
This solution demonstrates how users can interact with three different knowledge sources. You may not need all of three, however this should serve as a good example to build the right knowledge source for your particular use-case:

NOTE: The starter project creates an S3 bucket to store the documents used for the Bedrock Knowledge Base. Please consider using Amazon Macie to assist in the discovery of potentially sensitive data in S3 buckets. Amazon Macie can be enabled on a free trial for 30 days, up to 150GB per account.

  • Build your Knowledge Base on Amazon Bedrock using a Web Crawler. Optionally configure your knowledge base to scan or crawl website(s) to populate your knowledge base.
  • Amazon Bedrock Agents: Optionally enable your users to chat with an Amazon Bedrock Agents. Agents have the added benefit of supporting knowledge bases for answering questions and walking users through collecting the information needed to automate a task such as making a reservation. Sample agents are available in the Amazon Bedrock Agent Samples repository. Note that you will need to have an Amazon Bedrock Agent created in your region prior to deploying the solution.

Conclusion

A Modern Communications Hub, loosely coupled with core Generative AI services, will establish a composable foundation to build communication-channel-agnostic customer experiences on. Build one by leveraging the GenAI Integration Samples, Conversation Processor and Engagement Database, combining with the secure, scalable, high-performance, and cost-effective digital communication services by AWS End User Messaging and Amazon SES. This will provide a single point of conversational access to knowledge bases and agentic AI capabilities on Amazon Bedrock. Start experimenting with AI-powered customer experience innovations with a quick proof-of-concept that won’t interfere with your present customer engagement setup.

About the Authors

[$] Smarter IRQ suspension in the networking stack

Post Syndicated from corbet original https://lwn.net/Articles/1008399/

High-performance networking is a highly tuned activity; the amount of time
available to deal with each packet may be measured in nanoseconds, so care
must be taken to avoid anything that might slow the process down.
Recently, there has been a fair amount of attention given to a patch set
merged for 6.13
that, it is claimed, can improve processing efficiency
(and, thus, power savings)
in data centers by as much as 30%. The change itself, contributed by Joe
Damato and Martin Karsten, is a relatively small tweak to existing
optimization techniques; it shows just how much care is needed to optimize
a high-bandwidth server.

Plasma 6.3 released

Post Syndicated from corbet original https://lwn.net/Articles/1008971/

Version 6.3 of
the Plasma desktop has been released.

One year on, with the teething problems a major new release
inevitably brings firmly behind us, Plasma’s developers have worked
on fine-tuning, squashing bugs and adding features to Plasma 6 —
turning it into the best desktop environment for everyone.

Changes include improved support for
drawing tablets, better fractional-scaling support, and more.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/1008966/

Security updates have been issued by AlmaLinux (firefox, tbb, and thunderbird), Debian (cacti, libtasn1-6, and rust-openssl), Oracle (galera and mariadb, kernel, raptor2, and thunderbird), SUSE (bind, fq, java-21-openj9, libtasn1-6-32bit, ovmf, python310, python312, python313, python314, rime-schema-all, thunderbird, and wget), and Ubuntu (eglibc, firefox, glibc, linux, linux-aws, linux-lts-xenial, ruby2.3, ruby2.5, and vim).

How To Protect Your Organization’s Bluesky Account From Security Threats

Post Syndicated from Chris Boyd original https://blog.rapid7.com/2025/02/11/how-to-protect-your-organizations-bluesky-account-from-security-threats/

How To Protect Your Organization's Bluesky Account From Security Threats

When a new platform suddenly becomes popular, it’s not uncommon to see it stress tested by malware authors and fraudsters. Many organizations are making the leap to Bluesky without necessarily understanding the potential threats to an account and the business should a compromise take place.

This blog explains how to secure your Bluesky account from security threats such as malware and phishing, as well as establishing your identity to help prevent fraud and impersonation.

We will discuss:

  • What is Bluesky: How it works, what you can do with your data, and why you can keep using it when it’s time to move on.
  • Security and privacy settings: How you can keep your corporate account safe from harm.
  • Using your domain for identity verification: Setting your organization’s domain as the username for both the main account and employees.
  • Content and moderation: Steering your corporate account away from dubious content.

If you’ve recently been tasked with guiding your organization to social media breakout Bluesky, read on to see how you can get your team set up securely.

What is Bluesky?

Bluesky is a social network platform built on the Authenticated Transfer Protocol (ATProto), an “open, decentralized network for building social applications.” One of the desired intentions of using this is that you own your own data. It can be moved to different services thanks to Decentralized Identifiers (DIDs), which keep your services and user identity clearly separated. In theory, should Bluesky go away, you’ll be able to port your data elsewhere and keep your social graph intact.

Security and privacy settings

Bluesky’s security options may appear to be on the modest side, with 3 settings available in the “Privacy and Security” tab:

  • 2-factor authentication (2FA).
  • App passwords.
  • Logged-out visibility.

2FA: At time of writing, email is the only form of 2FA available. Enabling this option will result in email codes sent to your registered email address. These codes are required to be able to log into your account. To disable 2FA, you would need to approve a verification email sent to the same registered address.

This is not as robust an approach as using an authentication app or hardware key verification. If someone compromises your registered email address via phishing or malware, they’ll be able to disable email verification without you knowing and potentially hijack your account.

As a result, Rapid7 recommends you secure your registered email account with multi-factor authentication (MFA) alongside Single Sign-On (SSO).

2FA is still better than having no protection in place at all. In 2024, the US Securities and Exchange Commission (SEC) had its X account compromised because of a SIM swap attack, and the account was confirmed as having no 2FA enabled. Before the account could be recovered, a rogue post caused the price of Bitcoin to jump and then plummet in the space of a few minutes.

App passwords: These are codes generated by Bluesky which you can use for third-party apps, without having to give said apps your Bluesky password. The code can be deleted from your account at any time, and you can also specify whether or not the code grants access to your direct messages. Valid codes are 19 characters long, including 4 dashes, and can only be viewed at time of generation; if you don’t copy it, you’ll have to create a new one.

Logged-out visibility: Bluesky currently has no private account option — everything is public by default. This option requests that users be logged in before being able to access your content. A note of caution: Bluesky warns that “other apps may not honor this request.” It’s trivial to see content while not logged in, so if this is a deal breaker for your business, you may be better off waiting for more granular privacy controls.

Using your domain for identity verification

One of Bluesky’s core features is using DNS management to present the same user identity across the (eventually) federated Bluesky landscape. It makes use of ATProto to offer this functionality, so if you want to verify your on Bluesky account you’ll need to do it via one of your domains. The end result is that your username will be your organization’s web address, like so:

bsky.app/profile/rapid7.com

You can also offer subdomains to all of your employees, who will display as “@theirname.yourbusinessname.com” or similar.

This is useful in relation to verification and identity because closing a social media account often requires an exit plan. You can’t just abandon an account; it could end up being hijacked or forgotten about, with sensitive information lurking in direct messages. You can’t just delete it either, because anyone could grab your old username and use it for nefarious purposes.

Bluesky’s approach enables you to retain the same official username across multiple eligible platforms, and neatly sidesteps any issues arising from platform-specific verification schemes which may be changed, abandoned, or replaced entirely.

There are still some potential issues to consider here. Once the domain-centric username is enabled, your old account will be released back into the wild. This means someone else could register it, and pretend to be your organization. They could then mount phishing campaigns under your brand, or send out malware links under the guise of business-centric activities. You’ll need to be ready to register the old username via another secure email address, and then park it safely to one side while not forgetting to enable 2FA.

This is still largely an improvement on the fate of other more well-known verification programs. When X changed the blue check system to paid premium access, the social media platform endured a wave of “verified” fakes. Elsewhere in 2022, a fake (but verified) pharmaceutical company account claimed that insulin was now “free.” This incident caused the real company’s stock to fall by 4.37%, and even arguably caused multiple advertisers to leave the platform itself.

Content and moderation

Bluesky has a variety of moderation features to steer your account away from scams, phishing, and malware. In addition to being able to mute specific words and tags, Bluesky also makes use of moderation lists, i.e., packs containing multiple users related to specific topics. You’ll find lists for cryptocurrency spammers, pornography bots, content scrapers, and even imitation accounts.

Under the Content Filters setting (found under “Settings > Moderation”), you can select “show”, “warn”, or “hide” for a variety of content including adult content and graphic media. With the recent introduction of video, there’s also the option to not automatically play said content. Additionally, you can enable or disable external media players for services like YouTube, Vimeo, and SoundCloud.

You can take this one step further via “Moderation > Advanced”, where controls allow you to use an “Off, Warn, Hide” setting for a variety of topics such as threats, security concerns, misinformation, scams, and spam, as well as the possibility of many others outside of Bluesky’s pruning defaults. This is done via stackable “labels” through third-party labelling moderation services, designed to work on top of default Bluesky moderation settings. If you select the hide setting for “malware spammers”, then all third-party labelled malware spammer accounts will be hidden from view thus limiting your exposure to multiple security threats.

In 2021, Cardiff University researchers highlighted that a large number of drive-by malware links posted to social media tended to include negative and fear-laden messaging. Said messages were 114% times more likely to be reposted than more benign content. Bluesky’s moderation tools also allow you to filter out posts labelled as containing intolerance, rudeness, and threats. Enabling these moderation options will reduce the possibility of similar rogue posting strategies leading to compromise by malware, social engineering, or system exploits.

Go forth and be social

Security threats propagated through social media date back to the early days of MySpace and Orkut. Even back then, techniques had shifted away from trolling and pranks to data theft via banking trojans and the spread of phishing links via direct messaging. Today’s newer platforms have employed many lessons learned from the mistakes of their forefathers; however, they are not impenetrable.

By making use of the various security and identity settings highlighted above, you’ll be ensuring your business has a more robust approach to tackling data theft, malware infections, and wider network infiltration via the frequently vulnerable underbelly of social network platforms.

Offline celebrations: how Christmas, NYE, and Lunar New Year festivities shape online behavior

Post Syndicated from João Tomé original https://blog.cloudflare.com/offline-celebrations-how-christmas-nye-and-lunar-new-year-festivities-shape-online-behavior/

Now that 2025 has been here for a few weeks and 2024 has closed with a variety of year-end traditions — from Christmas and Hanukkah celebrations to New Year’s Eve (NYE) countdowns, as well as celebrations of Orthodox Christmas, and Lunar/Chinese New Year — let’s examine how these events have shaped online behavior across continents and cultures. Reflecting on Christmas and NYE 2024 provides insights into how these trends compared with those of the previous year, as detailed in an earlier blog.

One notable finding is the remarkable consistency in human online patterns from one year to the next, a trend that persists despite cultural differences among countries. Data from over 50 countries reveal how people celebrated in 2024–2025, offering a timely reminder of typical holiday trends. While Christmas remains a dominant influence in many regions, other cultural and religious events — such as Hanukkah and local festivities — also shape online habits where Western traditions hold less sway.

In regions where Christmas is deeply rooted, Internet traffic dips significantly during Christmas Eve dinners, midnight masses, morning gift exchanges, and Christmas Day lunches, a pattern evident in both our previous and current analyses.

This analysis focuses exclusively on non-bot Internet traffic, filtering out automated activity to highlight genuine human behavior during the most recent holiday season. Before going into specific countries, here’s a global hourly snapshot (UTC-based) of Christmas and New Year’s Eve 2024 traffic from the Cloudflare Radar Data Explorer


This worldwide perspective captures notable drops across a 23-hour window, from New Zealand to Hawaii. Globally, December 25 saw a 19% drop in traffic from the previous week, followed by December 24 with a 14% drop. This holiday period also included the four days with the lowest global traffic during the period between October 1, 2024, and February 6, 2025. In descending order, these days were: December 25, December 24, January 1, 2025, and December 31, 2024.

Some key takeaways:

  • Europe: Christmas Eve drops in Internet traffic reached up to 67% (seen in Denmark; Spain reached 66%).

  • Americas: December 25 was key, with drops ranging from 26% in the US and up to 70% at midnight in Argentina.

  • Regional timing differs: Nordic countries on Christmas Eve disconnect earlier at around 18:00, Southern Europe at 21:00-22:00, and Latin America even later.

  • New Year’s shows worldwide impact, strongest in Latin America: a 73% drop in Chile, followed by 68% drop in Argentina.

  • Lunar New Year: January 29 is a peak offline moment, with drops of 25% in Hong Kong, 23% in Singapore, and 24% in Vietnam.

Note: Unless otherwise noted, all times used in this blog post are local ones; in countries with several timezones, we’re using the timezone where more people live. For the US, Eastern time is used.

Global Christmas and New Year’s Eve daily trends

In this analysis, we apply the same methods as our previous blog post to rank countries and regions by their lowest holiday traffic dates, showing each day’s percentage drop. Many locations, such as the United States, experience clear dips on December 24 and 25 as people disconnect for Christmas Eve and Christmas Day celebrations. In contrast, some regions show smaller declines on December 31 as the New Year approaches. The order and magnitude of these drops vary by country, reflecting cultural nuances — some nations register their largest drop on Christmas Eve, others on Christmas Day, and still others exhibit unique patterns around New Year’s Eve or January 1.

Below is a world map highlighting where traffic dropped the most on December 24 or 25; darker colors indicate larger drops based on our analysis.

In the following table, we provide more details than can be shown in the map. The data focuses only on locations that had their lowest traffic days between December 24-25 and December 31-January 1, along with the respective percentage drop on each of those days compared to the previous week (where applicable).

Top days with the lowest Internet traffic in December 2024 – January 2025

(with respective percentage drops, if any, from the previous week)

Location

December 24

December 25

December 31

January 1

Denmark

-42%

-19%

Portugal

-34%

-29%

Poland

-33%

-24%

Norway

-33%

-15%

Spain

-31%

-26%

Switzerland

-30%

-28%

Finland

-30%

-22%

Austria

-29%

-19%

Ireland

-28%

-31%

Chile

-28%

-25%

-5%

Czech Republic

-28%

-16%

Sweden

-28%

-11%

Colombia

-26%

-35%

-5%

-8%

Italy

-26%

-31%

-1%

Argentina

-25%

-30%

-3%

Belgium

-25%

-23%

-1%

France

-24%

-24%

Mexico

-24%

-21%

-1%

Germany

-24%

-16%

United Kingdom

-22%

-32%

Brazil

-22%

-23%

-2%

-1%

United States

-21%

-26%

Canada

-20%

-22%

Netherlands

-19%

-30%

-8%

Australia

-19%

-29%

New Zealand

-18%

-27%

Greece

-17%

-22%

-5%

Romania

-16%

-12%

-7%

South Africa

-12%

-31%

-4%

Nigeria

-10%

-17%

Japan

-6%

-6%

Philippines

-4%

-6%

-5%

-3%

In cultures with a strong Christmas tradition — mostly in the West — people generally go offline on Christmas Eve (December 24) or Christmas Day (December 25). In regions where Christmas is less culturally significant, key offline moments occur on other dates, such as December 31 or January 1.

In Europe, most countries (including Denmark, Norway, Spain, Portugal, Switzerland, Finland, Czech Republic, Germany, France, Poland, Sweden, Austria, the United Kingdom, Italy, Ireland, Belgium, and Romania) experience their largest traffic drop on December 24, making Christmas Eve the primary offline moment. Some countries also exhibit a less significant drop in traffic on December 25 or December 31.

North America and Latin America display similar patterns, with the United States, Canada, and Mexico showing the largest drop on December 25. In Latin America — specifically in Argentina, Chile, and Colombia — December 25 also sees a significant decline, though in some cases January 1 emerges as a key offline moment, indicating slight variations in local celebration timing.

In Asia, the traffic drops are milder. For example, Japan experienced only modest declines on December 24 and 25, while in the Philippines, January 1 recorded a 3% drop compared with December 25, which had a 6% drop from the previous week. In Hong Kong, Singapore, and Malaysia, the influence of Lunar/Chinese New Year is more pronounced; however, Christmas Day 2024 still registered noticeable declines of 12%, 13%, and 9% in these locations, respectively. Meanwhile, in Indonesia and Turkey, December 31 is their peak low-traffic day, suggesting that Christmas plays a less central role in their offline behavior.

As an example, here’s the US perspective from Cloudflare Radar Data Explorer, where the drop in traffic during Christmas 2024 and New Year’s 2025 is evident:


Comparing Christmas 2023 with 2024, most European regions experienced a stronger traffic drop on their key Christmas day — whether December 24 or December 25 — than in the previous year. The ranking of the days with the lowest traffic sometimes shifts, with new dates such as December 23 or January 1 entering the top three. In North and Latin America, while December 24 and 25 remain important, January 1 has also emerged in several cases. 

Orthodox Christmas impact

In countries that celebrate Orthodox Christmas (January 7), Internet traffic follows a distinct pattern. During the December 25 Christmas period, the drops are relatively modest — for example, Russia sees a 6% decrease on December 25, while Romania and Ukraine register declines of 16% on December 24 and 12–13% on December 25. However, because traffic falls significantly on December 30–31 — even more so than on December 24–25 — the levels on January 6–7 are considerably higher compared with the previous week. In fact, a notable surge occurs on January 7 compared with December 31, with traffic increasing by 30% in Russia, 32% in Romania, 24% in Ukraine, 31% in Belarus, and 15% in Kazakhstan.

Below is a daily chart of Internet traffic in Russia, which clearly shows the December 30–31 drop and a strong rebound in the following days of the new year. Notably, there is a slight decline on January 6, 2025 — the Orthodox Christmas Eve — registering a 4% drop compared with the previous day.


Where Christmas isn’t central

Not every country’s December revolves around Christmas. Hanukkah’s timing changes each year, influencing when people log off. In 2024, Hanukkah started on the evening of December 25, leading to a 5% drop in traffic in Israel, followed by 4% drops on the next two days. (Hanukkah lasted until January 2, 2025.) Looking at a more granular view, traffic dropped ~15% between 14:45 and 20:00 in Israel on December 25. The chart below highlights the days that Hanukkah was celebrated.


In 2023, Hanukkah began on December 7, leading to an 8% traffic drop in Israel that day and a 7% decline on the following days. More granular data shows that on December 7, traffic dropped the most around 17:00, reaching as much as 17%.

In Saudi Arabia, Turkey, Egypt, and Indonesia, the lowest traffic days don’t align with December 24-25. In those regions, Ramadan is a much more impactful event, as we’ve noted in previous blog posts. Meanwhile, in other regions such as China, Hong Kong, Singapore, Vietnam, Taiwan, and South Korea, Lunar New Year plays a much bigger role, as we’ll analyze in more detail below.

Now, let’s focus on a more granular perspective of these trends, showing the impact of Christmas dinners and lunches, and also New Year’s Eve drops in traffic.

A more granular perspective of Christmas


Europe

The Christmas 2024 data show that in Europe, as we saw in the previous year, the stronger traffic drop still occurs during Christmas Eve dinner. In Spain, for example, there is a 66% drop compared with the previous week at 21:45, while the morning and lunch periods on Christmas Day see further declines of 55% at 08:00 and 47% at 15:30. Denmark recorded a 67% drop at 18:45 and a 50% drop the next morning at 07:00. Poland and the Czech Republic experience steep dinner declines, with drops as high as 60% (17:15) and 55% (17:45) respectively, followed by substantial drops in the early morning. France, Portugal, Italy, Switzerland, and Germany follow similar patterns, with dinnertime drops ranging between 46% and 57%, along with additional significant declines during the morning or lunchtime hours.

A closer look at timing reveals interesting regional differences also related to typical times for dinner. In Nordic countries such as Denmark, Norway, Sweden, Finland, and Poland, the Christmas Eve dinnertime drop in traffic happens relatively early — Denmark’s is at 18:45, and Norway’s occurs around 17:45 to 18:15, with Sweden and Finland also showing early declines. A similar pattern appears in the Czech Republic (17:45). Some countries show mixed trends, such as the UK, which sees a 34% drop in traffic both at 16:15 and 20:30, or Switzerland, with 47% at 19:00 and 50% at 21:00, and Germany, with 46% at 19:15.

In contrast, many Latin and Southern European countries experience peak drops later in the evening (this includes Latin America, as we’ll highlight below). Spain, for instance, reaches its maximum drop at 21:45, while Italy and Portugal see the largest declines at 21:15. Greece records its biggest drop between 21:45 and 22:45, at 37%. Romania and France, for example, are slightly earlier, at 20:45. These early or late traffic drops reflect local dinner traditions, which vary by region.

Americas

In the Americas, holiday patterns continue to reflect a mix of cultural traditions. In the United States, Christmas Eve sees a 30% drop between 19:45 and 20:45, aligning with family gatherings, while Christmas Day mornings record a 39% decline at 09:30 and a 33% drop at 13:15, highlighting the quiet start to the day. It’s similar in Canada, both in the drop (35%) and the time (20:30), but Mexico aligns more closely with South American countries.

In Latin America, Christmas Eve (Nochebuena) remains the key period of reduced Internet usage, and the following trends are consistent with Christmas 2023. Significant traffic declines align with late-night traditions like the Midnight Toast (in Argentina, the late-night feast is especially popular) and Misa de Gallo (Midnight Mass). For example:

  • Chile: -62% at 22:45, -63% at midnight (December 25)

  • Argentina: -60% at 22:15, -70% at midnight

  • Colombia: -49% at 22:15, -34% at midnight

  • Peru: -47% at 22:30, -53% at midnight

  • Mexico: -48% at 22:30, -40% at midnight

  • Brazil: -46% at 22:00

Asia Pacific

In the Asia Pacific region and other parts of the world, the reduction in online activity is noticeably milder. Countries such as Indonesia, Japan, South Korea, and Thailand record much smaller drops at Christmas Eve dinner and in the morning. For instance, Japan’s dinner drop is only 11%, while South Korea’s is 18%.

Singapore, Hong Kong, Malaysia, and the Philippines show more variability, with some moderate dinnertime drops but stronger declines later in the day in places like Singapore and Hong Kong. New Zealand and Australia, in the Southern Hemisphere, experienced a 29% and 30% drop respectively at dinner followed by even deeper declines in the morning and early afternoon.

Middle East and Africa

Turning to the Middle East and Africa, the trends reflect regional cultural differences. In these areas the reduction in online activity is generally less dramatic than in predominantly Christian regions. Nigeria, for example, shows a 20% drop at dinner (with additional declines at later times). Our analysis also includes other Middle Eastern locations such as the United Arab Emirates, which registers a relatively modest -12% drop at Christmas Eve dinner with deeper declines later in the day.

In previous blog posts, we have shown how events like Ramadan clearly impact Internet traffic in countries with large Muslim populations. One example from our Year in Review 2024 highlights Indonesia and the United Arab Emirates, where traffic dropped during Eid al-Fitr, the festival marking the end of Ramadan (April 9-10, 2024).


Boxing Day trends

Boxing Day on December 26 shows a sharp rebound in online activity after the significant drop in traffic during Christmas. In the UK, Canada, Australia, and New Zealand, traffic recovered as people return online after the Christmas break, even if daily traffic in the UK and Canada compared with the previous week was still lower -2% and -3% respectively, it was much higher than Christmas Day (+42% in the UK and +24% in Canada). Traditionally associated with charitable activities, family gatherings, and shopping, the day sees traffic spikes across these regions:

Location

December 26 increase/decrease in daily traffic

Peak traffic increase on December 26

Australia

+6%

December 26, 10:00: +12%

United Kingdom

-2%

December 26, 12:45: +7%

Canada

-3%

December 26, 12:15: +1%

New Zealand

+2%

December 26, 10:30: +7%, 17:15: +11%

Christmas traffic drops in more detail

Here is the list of locations that saw a clear drop in traffic on Christmas Eve or Christmas Day in the morning or around lunch. We selected the time (morning or lunch) with the largest drop compared to the previous week for further analysis. The list is ordered by the Christmas Eve dinner drop. Countries like Russia (where Orthodox Christians celebrate Christmas later, on January 7), Japan, China, Indonesia, Turkey, Israel, Thailand, Egypt, Singapore, Vietnam, and Bangladesh showed no impact during Christmas Eve dinner or Christmas Day morning or lunch.

Location

Christmas Eve Dinner Drop

Christmas Day Morning/Lunch Drop

Spain

-66% at 21:45

-55% at 08:00, -47% at 15:30

Denmark

-67% at 18:45

-50% at 07:00

Argentina

-60% at 22:15, (-70% at 00:00, December 25)

-60% at 08:30

Poland

-60% at 17:15

-52% at 07:15, -33% at 15:45

Chile

-62% at 22:45, (-63% at 00:00, December 25)

-55% at 08:45

Norway

-56% at 17:45, -56% at 18:15

-49% at 07:30, -23% at 13:30

Czech Republic

-55% at 17:45

-51% at 06:45, -26% at 14:00

France

-54% at 20:45

-50% at 07:00, -43% at 13:45

Portugal

-57% at 21:15

-54% at 07:30, -47% at 14:15

Italy

-48% at 21:15

-53% at 06:45, -55% at 13:45

Switzerland

-47% at 19:00, -50% at 21:00

-50% at 06:45, -37% at 13:45

Germany

-46% at 19:15

-40% at 07:15, -21% at 13:45

Brazil

-46% at 22:00

-42% at 08:15, -35% at 13:45

Sweden

-46% at 15:15, -46% at 16:30

-43% at 07:15, -20% at 13:15

Colombia

-49% at 22:15,  (-34% at 00:00, December 25)

-55% at 07:45, -44% at 15:15

Belgium

-51% at 19:45

-49% at 07:15

Mexico

-48% at 22:30, (-40% at 00:00, December 25)

-46% at 08:00

Finland

-45% at 15:30, -43% at 17:00-17:45

-46% at 08:30, -34% at 14:30

Austria

-48% at 19:30

-47% at 06:15, -29% at 14:15

United Kingdom

-34% at 16:15, -34% at 20:30

-36% at 09:00, -43% at 14:45

Romania

-34% at 20:45

-34% at 06:30

Ireland

-38% at 16:15, -40% at 21:00

-42% at 09:30, -42% at 15:15

Canada

-35% at 20:30

-35% at 09:30, -27% at 16:00

South Africa

-26% at 19:30

-35% at 09:30, -46% at 14:30

Netherlands

-35% at 21:00

-38% at 08:30, -40% at 16:00

United States

-30% at 19:45-20:45

-39% at 09:30, -33% at 13:15

Australia

-30% at 21:00

-44% at 13:45

New Zealand

-29% at 19:45

-39% at 09:30, -44% at 13:45

Ukraine

-25% at 18:15

-25% at 09:00, -19% at 14:30

Nigeria

-20% at 16:45, -21% at 22:30

-22% at 13:45, (-36% at 21:45)

South Korea

-18% at 21:00

-19% at 07:45

Malaysia

-19% at 22:15

-22% at 09:15, -13% at 14:15

Philippines

-19% at 21:30

-26% at 06:00

Hong Kong

-13% at 20:30

-20% at 10:00, -17% at 16:15

Japan

-11% at 19:45

-12% at 18:00

Many countries, though not all, experienced a noticeable drop in Internet traffic during Christmas Day lunch, with variations in timing. Spain, Poland, Norway, the Czech Republic, France, Portugal, Italy, Switzerland, Germany, Brazil, Sweden, Colombia, Finland, Austria, the United Kingdom, Ireland, Canada, South Africa, the Netherlands, the United States, New Zealand, and Ukraine all recorded significant declines, mostly in the early afternoon. In contrast, Denmark, Argentina, Chile, Belgium, Mexico, Romania, and Australia did not exhibit the same lunch decline.

New Year’s Eve: A planetary moment


Midnight on January 1 — a moment when people around the world turned away from their screens — revealed regional differences in digital behavior as people disconnected to celebrate. To accurately assess New Year’s impact, we compared traffic at 00:00 on January 1 with 00:00 on December 18 (the same time two weeks prior), avoiding Christmas distortions. This approach highlights the distinct drop in Internet activity due to the celebrations. These latest holiday patterns mirror those of 2023, with slight percentage changes and Latin American countries exhibiting larger drops than Northern Europe or some Asian regions.

Latin America countries led our global analysis with the strongest drops: Chile registered a 73% decline, Argentina 68%, and Colombia a 50% drop, underscoring deep-rooted traditions that drove people to disconnect at midnight.

European nations also experienced substantial declines in Internet traffic, especially those in Latin or Southern Europe, with Romania (-60%), Italy (-58%), Portugal (-57%), and Spain (-56%) demonstrating pronounced drops, while countries like Germany (-48%) and Switzerland (-42%) also emphasized the cultural importance of New Year’s celebrations. Northern Europe, however, showed a more moderate impact, with Norway dropping by 41% and Sweden by 22%.

In contrast, North America experienced a relatively milder decrease in online activity, with the United States with a drop in traffic of 11% and Canada at 15%, likely due to the spread of time zones and staggered celebrations. The trend was similar in 2023, with a 12% drop in the US and 14% in Canada, reinforcing the consistency of local Internet usage patterns from year to year.

Across Asia and the Pacific, the impact varied: the Philippines (-41%), Australia (-21%), South Korea (-18%), and Singapore (-18%) showed significant declines, while Indonesia (-7%) and Malaysia (-11%) experienced a smaller drop.

In the Middle East, the United Arab Emirates saw a 29% decline, and Egypt dropped by 7%, whereas Israel recorded an 11% increase, indicating different cultural or post-celebration dynamics. The 2024 data highlighted New Year’s global influence, with patterns of reduced online activity shaped by diverse local traditions that impacted digital activity.

Locations

January 1, 00:00 drop (compared to December 18)

Locations

January 1, 00:00 drop (compare to December 18)

Chile

-73%

Australia

-21%

Argentina

-68%

Ireland

-21%

Romania

-60%

United Kingdom

-20%

Italy

-58%

France

-20%

Portugal

-57%

Hong Kong

-20%

Spain

-56%

South Africa

-19%

Colombia

-50%

South Korea

-18%

Germany

-48%

Singapore

-18%

Brazil

-48%

Thailand

-17%

Mexico

-48%

Nigeria

-17%

Switzerland

-42%

Finland

-17%

Netherlands

-41%

Taiwan

-17%

Norway

-41%

Canada

-15%

Philippines

-41%

New Zealand

-15%

Poland

-40%

China

-12%

Ukraine

-39%

United States

-11%

Belgium

-38%

Malaysia

-11%

Austria

-38%

Indonesia

-7%

Russia

-35%

Egypt

-7%

Czech Republic

-31%

Vietnam

-3%

United Arab Emirates

-29%

Saudi Arabia

10%

Sweden

-22%

Israel

11%

Chinese & Lunar New Year: family time


The Lunar New Year, also known as Chinese New Year or Spring Festival, is widely celebrated across Asia. It began on Wednesday, January 29, 2025, marking the start of the Year of the Snake, a symbol of wisdom and intuition. A few days prior, China’s extended holiday period began, running from January 29 to February 4, 2025.

This period is marked by Chunyun, the world’s largest annual human migration, as millions return home. Key traditions include the New Year’s Eve Reunion Dinner, fireworks, and cultural performances such as temple fairs and dragon or lion dances. In South Korea, Malaysia, and Singapore, the holiday period was shorter, lasting from January 28 to 30, 2025. Here’s Vietnam as an example, where it is also clearly evident how traffic started to decrease after January 21, 2025:


Daily Internet traffic drops when people disconnected to celebrate across Asia. Hong Kong saw its sharpest decline on January 29 (-25%), while Singapore peaked at -23% on the same day. Vietnam (-24%) and Malaysia (-16%) also hit their lowest points on January 29. Taiwan’s biggest drop occurred on January 28 (-15%), while South Korea recorded moderate declines of 8% on both January 28 and 29. China experienced its largest drop on January 28 (-17%), while Indonesia saw its strongest decline on January 29 (-11%). In general, January 29 stood out as a key moment of reduced Internet traffic, though the impact varied by country.

Location

January 28

January 29

January 30

Hong Kong

-22%

-25%

-22%

Vietnam

-12%

-24%

-18%

Singapore

-17%

-23%

-16%

Malaysia

-9%

-16%

-12%

Taiwan

-15%

-14%

-12%

Indonesia

-11%

China

-17%

-9%

South Korea

-8%

-8%

The more granular traffic data revealed specific offline moments that mirrored rich cultural traditions. In China, digital activity dropped sharply on January 28 around midday (-36%) and again in the late afternoon. It also declined by 28% at 00:00 on January 29, likely reflecting deep engagement in family reunions and festivities. Hong Kong, Vietnam, and the Philippines also experienced significant declines around midnight, while Singapore, Malaysia, and Taiwan exhibited notable, though varied, drops.

Location

January 28/29 drops in traffic

China

January 28, 12:30: -36%, 18:15-20:15: -32%
January 29, 00:00: -28%, 08:00: -31%, 13:00: -19%

Singapore

January 29, 00:00: -12%, 15:00: -35%

Vietnam

January 28, 21:30: -33%,
January 29, 00:00: -33%, 06:00: -40%, 18:15: -38%

Philippines

January 28, 20:30: -7%
January 29, 00:00: +3%, 06:00: -8%

Hong Kong

January 28, 19:45: -36%
January 29, 00:00: -29%, 09:30: -40%, 14:45: -35%

Malaysia

January 28, 20:30-21:45: -18%,
January 29, 00:00: -12%, 09:30: -30%, 15:00: -25%, 21:15: -20%

Taiwan

January 28, 18:30: -34%;
January 29, 00:00: -14%, 12:30: -26%

It’s important to note that the midnight drop in traffic during Lunar or Chinese New Year was not as pronounced as during the Gregorian calendar’s New Year, as seen in previous data.

Conclusion: traditions stand the test of time

In 2024, the trends remain strikingly consistent with those of 2023. In Europe, Christmas Eve continues to be the main offline moment, with traffic drops reaching 67% in Denmark and 66% in Spain. In North and Latin America, December 25 remained the key day, as seen with a 26% drop in the US and up to 70% drop at midnight in Argentina. These patterns demonstrate that traditional celebrations still heavily influence online behavior.

Across Asia, unique cultural events drive distinct periods of reduced online activity. The Lunar New Year showed peak disconnection around January 29 in China, Hong Kong, Singapore, and Vietnam. Overall, the 2024 data reinforce the enduring impact of cultural rituals on global Internet usage. Those are also demonstrated by Ramadan in a different part of the year. It also reminds us that while the Internet connects billions, cultural rhythms continue to shape our relationship with technology.

If you’re interested in more trends and insights about the Internet, check out Cloudflare Radar. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

Backblaze Drive Stats for 2024

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-drive-stats-for-2024/

A decorative image with the title 2024 Year End Drive Stats.

As of December 31, 2024, we had 305,180 drives under management. Of that number, there were 4,060 boot drives and 301,120 data drives. This report will focus on those data drives as we review the Q4 2024 annualized failure rates (AFR), the 2024 failure rates, and the lifetime failure rates for the drive models in service as of the end of 2024. Along the way, we’ll share our observations and insights on the data presented, and, as always, we look forward to you doing the same in the comments section at the end of the post.

Sign up for the Drive Stats webinar

Tune in to ask those questions you’ve had spinning ‘round your head like so many drives, and meet the new Drive Stats team—Stephanie Doyle and David Johnson of Backblaze Blog fame. Yes, you heard that right: It’s my last Drive Stats before I head off to retirement (but more on that later in the report). Read on, and sign up, for analysis and insights from the 2024 report.

Sign Up ➔ 

Q4 2024 hard drive failure rates

As of the end of 2024, Backblaze was monitoring 301,120 hard drives used to store data. For our evaluation, we removed from consideration 487 drives, as they did not meet the criteria to be included. We’ll discuss the criteria we used in the next section of this report. Removing these drives leaves us with 300,633 hard drives to analyze. The table below shows the annualized failure rates for Q4 2024 for this collection of drives.

Notes and observations

  • 24TB drives are here. Seagate 24TB drives (model: ST24000NM002H) arrived in early December. The 1,200 drives filled one Backblaze Vault with no failed drives through the end of Q4. The 24TB Seagate drives join the 20TB Toshiba and 22TB WDC drive models in the 20-plus capacity club as we continue to dramatically increase storage capacity while optimizing existing storage server space.
  • Zero failures for the quarter. Five drive models had zero failures for the quarter starting with the 24TB Seagate drive model noted above. The others are the 4TB HGST (model: HMS5C4040ALE640), the 8TB Seagate (model: ST8000NM000A), the 14TB Seagate (model: ST14000NM000J), and the 16TB Seagate (model: ST16000NM002J). All of the zeroes come with the caveat of having a relatively small number of drives and drive days, but zero failures in a quarter is always a good thing.
  • The 4TB drives are nearly extinct. The 4TB drive count decreased by another 1,774 drives in Q4. (I discussed exactly how we migrate them in more detail if you want to dig in.) The remaining ~4,000 drives should be gone by the end of Q1 2025. They will be replaced by the incoming 20TB, 22TB, and 24TB drives. It should be noted that out of the 4TB drives in operation in Q4, only one failed, so those 20-plus TB drives have a lot to live up to from a failure perspective.
  • The quarterly failure rate is down. The AFR for Q4 dropped from 1.89% in Q3 to 1.35% in Q4. While all drive sizes delivered some improvement from Q3 to Q4, one of the primary drivers is the addition of over 14,000 new 20-plus TB drives. As a group, these drives delivered an AFR of 0.77% for the quarter.

Drive model criteria

We noted earlier we removed 487 drives from consideration when we produced the table above covering Q4 2024. There are two primary reasons we did not consider these drive models.

  • Testing. These are drives of a given model that we monitor and collect Drive Stats data on, but are not considered production drives at this time. For example, drives undergoing certification testing to determine if they are performant enough for our environment are not included in our Drive Stats calculations.
  • Insufficient data points. When we calculate the annualized failure rate for a drive model for a given period of time (quarterly, annual, or lifetime), we want to ensure we have enough data to reliably do so. Therefore we have defined criteria for a drive model to be included in the tables and charts for the specified period of time. Models that do not meet these criteria are not included in the tables and charts for the period in question.
Period Drive Count Drive Days
Quarterly > 100 > 10,000
Annual > 250 > 50,000
Lifetime > 500 >100,000

Regardless of whether or not a given drive model is included in the charts and tables, all of the data for all of the drives we use is included in our Drive Stats dataset which you can download by visiting our Drive Stats page.

As with the Q4 quarterly results, we will apply these criteria to the annual and lifetime charts that follow in this report.

2024 annual hard drive failure rates

As of the end of 2024, Backblaze was monitoring 301,120 hard drives used to store data. We removed nine drive models consisting of 2,012 drives from consideration as they did not meet the annual criteria we have defined. This leaves us with 298,954 drives divided across 27 different drive models. The table below shows the AFRs for 2024 for this collection of drives.

Notes and observations

  • No zeros for the year. There were no qualifying drive models with zero failures in 2024. That said, the 16TB Seagate (model: ST16000NM002J) got close by recording just one drive failure back in Q3, giving the drive an AFR of 0.22% for 2024. 
  • Busy data center techs. During 2024, our data center techs installed 53,337 drives. If we assume there are 2,080 work hours a year (52 weeks times 40 hours), that math is 53,337/2,080, and that means our intrepid DC techs installed 26 drives per hour. Busy, busy, busy! 
  • The 24TB Seagate drives? While there were 1,200 new 24TB Seagate drives added in 2024, they were installed in early December and did not accumulate enough drive days to make the cut for the annual, or lifetime, tables. Including the 24TB Seagate drive, there were three models that missed out on being included in the 2024 annual tables, these drive models are listed below.
MFG Model Drive Count Drive Days 2024 AFR
Seagate ST8000NM000A 247 22,684 0.84%
Seagate ST14000NM000J 232 19,696 1.32%
Seagate ST24000NM002H 1,200 18,000 0.00%

As a reminder, a drive model needs to have over 250 drives by the end of Q4 and accumulate at least 50,000 drive days during 2024 to be included in the annual tables.

Comparing Drive Stats for 2022, 2023, and 2024

The table below compares the annual failure rates by drive model for each of the last three years. The table includes just those drive models which met the annual criteria as of the end of 2024. The data for each year is inclusive of that year only for the operational drive models present at the end of each year. The table is sorted by drive size and then AFR.

Notes and observations

  • The annual AFR is down. The 2024 AFR for all drives listed was 1.57%, this is down from 1.70% in 2023.  We expect the overall failure rates to continue to fall in 2025, but we will be watching the following for indicators.
    • The failure rates of the 8TB and 12TB drive models. All of the models will exceed their five years of service. In general, the failure rate will noticeably increase as the drives exceed five years of service. And, while there are outliers like the current HGST 4TB drives, you can’t assume that will happen.
    • The failure rates of the 14TB and 16TB drive models. These models are approaching middle age—three to five years in operation. This is where, according to the bathtub curve, their failure rates could gradually increase—but not as severely as when they exceed five years. 
    • The failure rates for the 20TB, 22TB, and 24TB drives models. These drives will enter the flat portion of the bathtub curve, that is where their failure rate should be the lowest.

Annualized failure rates vs. drive size

Now, we can dig into the numbers to see what else we can learn. We’ll start by looking at the quarterly annualized failure rate by drive size over the last three years.

Let’s take a look at the different drive sizes and how they affect the overall annualized failure rate over time.

Minimal impact. The 4TB (blue line) drives and 10TB (gold line) drives have had little impact over the last year on the overall failure rate as each finished the year with a relatively small number of drives. Still, the wild ride delivered by the 10TB drives keeps our DC techs on their toes. 

Older drives. The 8TB (gray line) drives and 12TB (purple line) drives range in age from five to eight years and as such their overall failure rates should be increasing over time. The 12TB drives are following that pattern moving up from about 1% AFR back in 2021 to just about 3% in 2024. The failure rates of the 8TB drives, while erratic from quarter-to-quarter, have a nearly flat trendline over the same period.

Workhorse drives. The 14TB (green line) and 16TB (azure* line) drives comprise 57% of the drives in service and on average they range in age from two to four years. They are in the prime of their working lives. As such, they should have low and stable failure rates, and as you can see, they do.

*  Maybe azure isn’t quite right, but robin’s egg blue seemed a bit pretentious.

New drives on the block. The 22TB (orange line) drives are in their early days as we continue to add more drives on a regular basis. Once the drive population settles down, we’ll have a better sense of the AFR direction. Still, the early results are solid with a lifetime AFR of 1.06%.

Annualized failure rates vs. manufacturer

One of the more popular ways we can look at this data is by the drive manufacturer as we’ve done below.

To complete the picture, the chart below uses the same data, but displays just the linear trendlines for each of the manufacturers over the same three-year period.

HGST. While the HGST trendline is not pretty, it doesn’t tell the entire story. Looking at the first chart, until Q4 2023, the HGST drives were at or below the average for all of the drives, that is all manufacturers. At that point, HGST has exceeded the average, and then some. The table below contains results for just the HGST drives for 2024. We’ve sorted them, high to low, by the 2024 AFR.

As you can see, there are two 12TB drive models driving the high AFR for the HGST drives. The HUH721212ALN604 model began showing signs of an increased quarterly AFR in Q1 2023 and the HUH721212ALE604 model followed suit in Q3 2024. Without these drive models, the 2024 AFR for HGST drive would be 0.55%.

Seagate. The quarterly AFR trendline decreased for the Seagate drives from 2022 through 2024. While the decrease was slight, from 2.25% to 2.0%, Seagate was the only manufacturer to do so. The decrease appears, at least in part, to be due to the removal of the Seagate 4TB drives during that period. 

Toshiba. Over the 2022 to 2024 period, the quarterly AFR for the Toshiba drive models varied within a fairly narrow range between 0.80% and 1.52%, with most quarters hovering slightly around 1.2%. Most importantly, none of the individual drive models were outliers, as the highest quarterly AFR for any Toshiba drive model was 1.58%. We like consistency. 

WDC. While WDC drive models delivered a similar level of consistency as the Toshiba models, they did so with a lower AFR each quarter. From 2022 through 2024, the range of quarterly AFR values for the WDC models was 0.0% to 0.85%. The 0.0% AFR was in Q1 2022 when none of the 12,207 WDC drives in operation failed during that quarter.

Lifetime hard drive stats

As of the end of 2024, Backblaze was monitoring 301,120 hard drives used to store data. Applying our drive criteria noted above for the lifetime period, we removed 11 drive models consisting of 2,736 drives from consideration as they did not meet the lifetime criteria we defined. This leaves us with 298,230 drives divided across 25 different drive models. The table below shows the lifetime AFRs for this collection of drives.

The current lifetime AFR for all of the drives is 1.31%. This is down from 1.46% in 2023. The drop is primarily due to the completion of the migration of the 4TB Seagate drives in 2024, which left us with only two of these drives still in operation as of the end of 2024. As a consequence, the 79 million drive days and over 5,600 drive failures racked up by the 4TB Seagate drives by the end of 2023 are not included in the data presented in the 2024 lifetime table above.  

In the final table below, we’ve taken the lifetime table and sorted out the drive models that have a lifetime AFR of 1.50% or less by drive size.

A couple of caveats as you review the table.

  • There is enough data for each model to say the AFR values are solid. That said, everything could change tomorrow. In general, the hard drive failure rate follows the bathtub curve as the drives age—unless it doesn’t. Some drives refuse to fail as they age, like the 4TB HGST drives. Other drives are great, and then “hit the wall” and bend the failure curve upward, fast.
  • A drive model with a 1% annualized failure rate means that you can expect one drive out of 100 to fail in a year. If you’re a personal drive user, that one drive could be yours. If you have exactly one drive, your personal annualized failure rate is 100%. In other words, always have a backup, and don’t forget to test it.

Migration time

I have been authoring the various Drive Stats reports for the past ten years and this will be my last one. I am retiring, or perhaps in Drive Stats vernacular, it would be “migrating.” Either way, after 10 years in the U.S. Air Force and 30+ years in Silicon Valley Tech, it is time. Drive Stats will continue with Stephanie Doyle and David Johnson as the replacement drive models beginning with the Q1 2025 report. I wish them well.

I want to say thank you to each of you who have taken your time to peruse and engage with the Drive Stats reports and data over the last 10 years. And, thank you as well for the comments, questions, and discussions that raced and raged across the various communities that care about something as mundane and awesome as a hard drive. It has been quite the ride—thanks again.

The Hard Drive Stats data

The complete data set used to create the tables and charts in this report is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data itself to anyone; it is free.

Good luck, and let us know if you find anything interesting.

The post Backblaze Drive Stats for 2024 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Teaching AI safety: Lessons from Romanian educators

Post Syndicated from Elena Coman original https://www.raspberrypi.org/blog/teaching-ai-safety-lessons-from-romanian-educators/

This blog post has been written by our Experience AI partners in Romania, Asociatia Techsoup, who piloted our new AI safety resources with Romanian teachers at the end of 2024.

Last year, we had the opportunity to pedagogically test the new three resources on AI safety and see first-hand the transformative effect they have on teachers and students. Here’s what we found.

Students in class.

Romania struggles with the digital skills gap

To say the internet is ubiquitous in Romania is an understatement: Romania has one of the fastest internets in the world (11th place), an impressive mobile internet penetration (86% of the population), and Romania is leading Central and Eastern Europe in terms of percentage of population that is online (89% of the entire population). Unsurprisingly, most of Romania’s internet users are also social media users. 

When you combine that with recent national initiatives, such as

  • The introduction of Information Technology and Informatics in the middle-school curriculum in 2017 as a compulsory subject
  • A Digital Agenda as a national strategy since 2015 
  • Allocation of over 20% of its most recent National Recovery and Resilience Fund for digital transition

one might expect a similar lead in digital skills, both basic and advanced.

But only 28% of the population, well below the 56% EU average, and just 47% of young people between 16 and 24 have basic digital skills — the lowest percentage in the European Union. 

Findings from the latest International Computer and Information Literacy Study (ICILS, 2023)  underscore the urgent need to improve young people’s digital skills. Just 4% of students in Romania were scored at level 3 of 4, meaning they can demonstrate the capacity to work independently when using computers as information gathering and management tools, and are able, for example, to recognise that the credibility of web‐based information can be influenced by the identity, expertise, and motives of the people who create, publish, and share it.

Students use a computer in class.

Furthermore, 33% of students were assessed as level 1, while a further 40% of students did not even reach the minimum level set out in the ICILS, which means that they are unable to demonstrate even basic operational skills with computers or an understanding of computers as tools for completing simple tasks. For example, they can’t use computers to perform routine research and communication tasks under explicit instruction, and can’t manage simple content creation, such as entering text or images into pre‐existing templates.

Why we wanted to pilot the Experience AI safety resources

Add AI — and particularly generative AI — to this mix, and it spells huge trouble for educational systems unprepared for the fast rate of AI adoption by their students. Teachers need to be given the right pedagogical tools and support to address these new disruptions and the AI-related challenges that are adding to the existing post-pandemic ones.

This is why we at Asociația Techsoup have been enthusiastically supporting Romanian teachers to deliver the Experience AI curriculum created by the Raspberry Pi Foundation and Google DeepMind. We have found it to be the best pedagogical support that prepares students to fully understand AI and to learn how to use machine learning to solve real-world problems.

Testing the resources

Last year, we had the opportunity to pedagogically test the new three resources on AI Safety and see first-hand the transformative effect they have on teachers and students.

Students in class.

We worked closely with 8 computer science teachers in 8 Romanian schools from rural and small urban areas, reaching approximately 340 students between the ages of 13 and 18.

Before the teachers used the resources in the classroom, we worked with them in online community meetings and one-to-one phone conversations to help them review the available lesson plans, videos, and activity guides, to familiarise themselves with the structure, and to plan how to adapt the sessions to their classroom context. 

In December 2024, the teachers delivered the resources to their students. They guided students through key topics in AI safety, including understanding how to protect their data, critically evaluating data to spot fake news, and how to use AI tools responsibly. Each session incorporated a dynamic mix of teaching methods, including short videos and presentations delivering core messages, unplugged activities to reinforce understanding, and structured discussions to encourage critical thinking and reflection. 

Gathering feedback from users

We then interviewed all the teachers to understand their challenges in delivering such a new curriculum and we also observed two of the lessons. We took time to discuss with students and gather in-depth feedback on their learning experiences, perspectives on AI safety, and their overall engagement with the activities, in focus groups and surveys.

Feedback gathered in this pilot was then incorporated into the resources and recommendations given to teachers as part of the AI safety materials.

Teachers’ perspectives on the resources

It became obvious quite fast for both us and our teachers that the AI safety resources cover a growing and unaddressed need: to prepare our students for the ubiquitous presence of AI tools, which are on the road to becoming as ubiquitous as the internet itself.

A teacher and students in class.

Teachers evaluated the resources as very effective, giving them the opportunity to have authentic and meaningful conversations with their students about the world we live in. The format of the lessons was engaging — one of the teachers was so enthusiastic that she actually managed to keep students away from their phones for the whole lesson. 

They also appreciated the pedagogical quality of the resources, especially the fact that everything is ready to use in class and that they could access them for free. In interviews, they also appreciated that they themselves also learnt a lot from the lessons:

“For me it was a wake-up call. I was living in my bubble, in which I don’t really use these tools that much. But the world we live in is no longer the world I knew. … So such a lesson also helps us to learn and to discover the children in another context, – Carmen Melinte, a computer science teacher at the Colegiul Național Grigore Moisil in the small city of Onești, in north-east Romania, one of the EU regions with the greatest poverty risk.

What our students think about the resources

Students enjoyed discussing real-world scenarios and admitted that they don’t really have adults around whom they can talk to about the AI tools they use. They appreciated the interactive activities where they worked in pairs or groups and the games where they pretended to be creators of AI apps, thinking about safety features they could implement:

“I had never questioned AI, as long as it did my homework,” said one student in our focus groups, where the majority of students admitted that they are already using large language models (LLMs) for most of their homework.

“I really liked that I found out what is behind that ‘Accept all’ and now I think twice before giving my data,” – Student at the end of the ‘Your data and AI’ activities.

“Activities put me in a situation where I had to think from the other person’s shoes and think twice before sharing my personal data,” commented another student.

Good starting point

This is a good first step: there is an acute need for conversations between young people and adults around AI tools, how to think about them critically, and how to use them safely. School is the right place to start these conversations and activities, as teachers are still trusted by most Romanian students to help them understand the world.

Students use a computer in class.

But to be able to do that, we need to be serious about equipping teachers with pedagogically sound resources that they can use in class, as well as training them, supporting them, and making sure that most of their time is dedicated to teaching, and not administration. It might seem a slow process, but it is the best way to help our students become responsible, ethical and accountable digital citizens.

We are deeply grateful to the brave, passionate teachers in our community who gave the AI safety resources a try and of course to our partners at the Raspberry Pi Foundation for giving us the opportunity to lead this pilot.

If you are a teacher anywhere in the world, give them a try today to celebrate Safer Internet Day: rpf.io/aisafetyromania

The post Teaching AI safety: Lessons from Romanian educators appeared first on Raspberry Pi Foundation.

Feeding Zabbix MQTT Data with Ivo Schooneman

Post Syndicated from Michael Kammer original https://blog.zabbix.com/feeding-zabbix-mqtt-data-with-ivo-schooneman/29650/

This year’s Zabbix Conference Benelux is rapidly approaching, and to whet our audience’s appetite we sat down for a short interview with one of the conference’s featured presenters. Open Source Consultant Ivo Schooneman works for our Certified Partner Xifeo ICT B.V., and we quizzed him about how he got started in the open-source movement, what led him down the path to Zabbix, and how he sees Zabbix and MQTT (Message Queuing Telemetry Transport) fitting into a new world of connected and smart devices.

Please tell us a bit about yourself and how you got to this point in your career.

I started using Linux and open-source software in the mid-1990s, building my own home server for email, a website, and a firewall. After my internship in a Linux team, I got a job offer that I accepted. From there I became a consultant, helping small companies to build their servers. Eventually, I moved to a bigger company to offer consultancy services to banks, telecom providers, television stations, and streaming providers.

When you work in a company with a lot of open source enthusiasts, you hear a lot about new products and keep growing to make the environment for your customers better every day. I completed a lot of certifications and training courses – AWS, RHCA, Python, and many more. At some point, I wanted to move on, so I joined Xifeo, where I got the boost I needed to specialize in Zabbix.

How long have you been using Zabbix? What kind of Zabbix-related tasks do you and your team get involved in daily?

I’ve been using Zabbix for about 4 years. When I joined Xifeo in 2022, I took the opportunity to give myself a boost by gaining my Zabbix Certified User, Zabbix Certified Specialist, and Zabbix Certified Professional certifications. I’ve grown to trust it so much that even my home network is monitored with Zabbix!

Can you give us a few clues about what we can expect to hear during your presentation at Zabbix Conference Benelux?

I will talk about the history of MQTT, where it started, how it works with pub/sub, and talk about some scenarios where it could be used – there are always other options. At that point we will take a look at the plugin configuration and see how to split your different subscriptions to multiple topics and servers.

In your experience, does Zabbix lend itself easily to the IoT generally and MQTT data specifically?

I never feel like it’s necessary to have the color of my lights in Zabbix, but the power usage of my appliances, temperatures, particular matter in my rooms, that’s data you can monitor and alert on! MQTT is a great and easy way to do that, as writing API calls for every device would be very time consuming.

What changes do you think MQTT data and the IoT will bring to the world of monitoring over the next decade or so?

With more and more devices capable of measuring different things, we can gather more data than ever before. As the prices of sensors keep dropping, we can measure about anything we want on any place you want. By using a standard protocol, such as MQTT, you will have a uniform way of handling the data. IoT will help us save power and make life easier!

The post Feeding Zabbix MQTT Data with Ivo Schooneman appeared first on Zabbix Blog.

The collective thoughts of the interwebz