Amazon MSK Serverless Now Generally Available–No More Capacity Planning for Your Managed Kafka Clusters

Today we are making Amazon MSK Serverless generally available to help you reduce even more the operational overhead of managing an Apache Kafka cluster by offloading the capacity planning and scaling to AWS.

In May 2019, we launched Amazon Managed Streaming for Apache Kafka to help our customers stream data using Apache Kafka. Apache Kafka is an open-source platform that enables customers to capture streaming data like clickstream events, transactions, and IoT events. Apache Kafka is a common solution for decoupling applications that produce streaming data (producers) from those consuming the data (consumers). Amazon MSK makes it easy to ingest and process streaming data in real time with fully managed Apache Kafka clusters.

Amazon MSK reduces the work needed to set up, scale, and manage Apache Kafka in production. With Amazon MSK, you can create a cluster in minutes and start sending data. Apache Kafka runs as a cluster on one or more brokers. Brokers are instances with a given compute and storage capacity distributed in multiple AWS Availability Zones to create high availability. Apache Kafka stores records on topics for a user-defined period of time, partitions those topics, and then replicates these partitions across multiple brokers. Data producers write records to topics, and consumers read records from them.

When creating a new Amazon MSK cluster, you need to decide the number of brokers, the size of the instances, and the storage that each broker has available. The performance of an MSK cluster depends on these parameters. These settings can be easy to provide if you already know the workload. But how will you configure an Amazon MSK cluster for a new workload? Or for an application that has variable or unpredictable data traffic?

Amazon MSK Serverless
Amazon MSK Serverless automatically provisions and manages the required resources to provide on-demand streaming capacity and storage for your applications. It is the perfect solution to get started with a new Apache Kafka workload where you don’t know how much capacity you will need or if your applications produce unpredictable or highly variable throughput and you don’t want to pay for idle capacity. Also, it is great if you want to avoid provisioning, scaling, and managing resource utilization of your clusters.

Amazon MSK Serverless comes with a lot of secure features out of the box, such as private connectivity. This means that the traffic doesn’t leave the AWS backbone, AWS Identity and Access Management (IAM) access control, and encryption of your data at rest and in transit, which keeps it secure.

An Amazon MSK Serverless cluster scales capacity up and down instantly based on the application requirements. When Apache Kafka clusters are scaled horizontally (that is, more brokers are added), you also need to move partitions to these new brokers to make use of the added capacity. With Amazon MSK Serverless, you don’t need to scale brokers or do partition movement.

Each Amazon MSK Serverless cluster provides up to 200 MBps of write-throughput and 400 MBps of read-throughput. It also allocates up to 5 MBps of write-throughput and 10 MBps of read-throughput per partition.

Amazon MSK Serverless pricing is based on throughput. You can learn more on the MSK’s pricing page.

Let’s see it in action
Imagine that you are the architect of a mobile game studio, and you are about to launch a new game. You invested in the game’s marketing, and you expect it will have a lot of new players. Your games send clickstream data to your backend application. The data is analyzed in real time to produce predictions on your players’ behaviors. With these predictions, your games make real-time offers that suit the current player’s behavior, encouraging them to stay in the game longer.

Your games send clickstream data to an Apache Kafka cluster. As you are using an Amazon MSK Serverless cluster, you don’t need to worry about scaling the cluster when the new game launches, as it will adjust its capacity to the throughput.

In the following image, you can see a graph of the day of the launch of the new game. It shows in orange the metric MessagesInPerSec that the cluster is consuming. And you can see that the number of messages per second is increasing first from 100, which is our base number before the launch. Then it increases to 300, 600, and 1,000 messages per second, as our game is getting downloaded and played by more and more players. You can feel confident that the volume of records can keep increasing. Amazon MSK Serverless is capable of ingesting all the records as long as your application throughput stays within the service limits.

Graph of messages in per second to the cluser

How to get started with Amazon MSK Serverless
Creating an Amazon MSK Serverless cluster is very simple, as you don’t need to provide any capacity configuration to the service. You can create a new cluster on the Amazon MSK console page.

Choose the Quick create cluster creation method. This method will provide you with the best-practice settings to create a starter cluster and input a name for your cluster.

Create a cluster

Then, in the General cluster properties, choose the cluster type. Choose the Serverless option to create an Amazon MSK Serverless cluster.

General cluster properties

Finally, it shows all the cluster settings that it will configure by default. You cannot change most of these settings after the cluster is created. If you need different values for these settings, you might need to create the cluster using the Custom create method. If the default settings work for you, then create the cluster.

Cluster settings page

Creating the cluster will take you a few minutes, and after that, you see the Active status on the Cluster summary page.

Cluster information page

Now that you have the cluster, you can start sending and receiving records using an Amazon Elastic Compute Cloud (Amazon EC2) instance. For doing that, the first step is to create a new IAM policy and IAM role. The instances need to authenticate using IAM in order to access the cluster from the instances.

Amazon MSK Serverless integrates with IAM to provide fine-grained access control to your Apache Kafka workloads. You can use IAM policies to grant least privileged access to your Apache Kafka clients.

Create the IAM policy
Create a new IAM policy with the following JSON. This policy will give permissions to connect to the cluster, create a topic, send data, and consume data from the topic.

    "Version": "2012-10-17",
    "Statement": [
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:kafka:<REGION>:<ACCOUNTID>:cluster/msk-serverless-tutorial/cfeffa15-431c-4af4-8725-42636fab9937-s3"
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:kafka:<REGION>:<ACCOUNTID>:topic/msk-serverless-tutorial/cfeffa15-431c-4af4-8725-42636fab9937-s3/msk-serverless-tutorial"
            "Effect": "Allow",
            "Action": [
            "Resource": "arn:aws:kafka:<REGION>:<ACCOUNTID>:group/msk-serverless-tutorial/cfeffa15-431c-4af4-8725-42636fab9937-s3/*"

Make sure that you replace the Region and account ID with your own. Also, you need to replace the cluster, topic, and group ARN. To get these ARNs, you can go to the cluster summary page and get the cluster ARN. The topic ARN and the group ARN are based on the cluster ARN. Here, the cluster and the topic are named msk-serverless-tutorial.


Then create a new role with the use case EC2 and attach this policy to the role.

Create a new role

Create a new EC2 instance
Now that you have the cluster and the role, create a new Amazon EC2 instance. Add the instance to the same VPC, subnet, and security group as the cluster. You can find that information on your cluster properties page in the networking settings. Also, when configuring the instance, attach the role that you just created in the previous step.

Cluster networking configuration

When you are ready, launch the instance. You are going to use the same instance to produce and consume messages. To do that, you need to set up Apache Kafka client tools in the instance. You can follow the Amazon MSK developer guide to get your instance ready.

Producing and consuming records
Now that you have everything configured, you can start sending and receiving records using Amazon MSK Serverless. The first thing you need to do is to create a topic. From your EC2 instance, go to the directory where you installed the Apache Kafka tools and export the bootstrap server endpoint.

cd kafka_2.13-3.1.0/bin/

As you are using Amazon MSK Serverless, there is only one address for this server, and you can find it in the client information on your cluster page.

Viewing client information

Run the following command to create a topic with the name msk-serverless-tutorial.

./ --bootstrap-server $BS \
--command-config \
--create --topic msk-serverless-tutorial --partitions 6

Now you can start sending records. If you want to see the service work under a high throughput, you can use the Apache Kafka producer performance test tool. This tool allows you to send many messages at the same time to the MSK cluster with a defined throughput and specific size. Experiment with this performance test tool, change the number of messages per second and the record size and see how the cluster behaves and adapts its capacity.

./ --bootstrap-server $BS \
--command-config \
--create --topic msk-serverless-tutorial --partitions 6

Finally, if you want to receive the messages, open a new terminal, connect to the same EC2 instance, and use the Apache Kafka consumer tool to receive the messages.

cd kafka_2.13-3.1.0/bin/
./ \
--bootstrap-server $BS \
--consumer.config \
--topic msk-serverless-tutorial --from-beginning

You can see how the cluster is doing on the monitoring page of the Amazon MSK Serverless cluster.

Cluster metrics page

Amazon MSK Serverless is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo).
Learn more about this service and its pricing on the Amazon MSK Serverless feature page.


Инфлация и покупка на недвижим имот – спасяваме ли спестяванията си?

Сега ли е момента за апартамента?

Според статистиката на БНБ, през март 2022 жилищните кредити са нарастнали с 18.3% за една година.
Не е малък ръста. За съжаление БНБ го публикува единствено като сума на кредитите, но не и като бройка. Поради тази причина не може да се каже дали ръста в кредитирането е на база увеличено търсене или увеличение на цените.
БНБ – какво да се прави? Ще си правят каквото си искат шом законът им го позволява.

Според моята практика се наблюдава засилено търсене. Има разбира се и покачване на цените, но определено има и нарастнало търсене.

До преди година ръста в търсенето се мотивираше от покупката на имот като форма инвестиция. Затова и нарастнаха отдаваните под наем в AIRBNB имоти. В момента обаче двигателят на покупката на имоти е СТРАХЪТ ОТ ИНФЛАЦИЯТА.

В масовия случай хората имат спестени около 50-60 000 лева и мислят как да ги защитят.
За съжаление в Бълария няма много алтернативи за инвестиции. Много финансисти спорят с мен, че борсите са един добър вариант, но аз не мисля така. Как си представяте масовия човек в България да инвестира през борсите на запад?

И са си прави хората донякъде


Официалната инфлация в момента е 10.2%, но според мен е доста по-висока. Нека не забравяме, че само преди месец, бюджета беше сметнат при инфлация от 5.4%. Не е много професионално от страна на финансовото министерство да не може да педвиди месец напред как ще са цените, но както се казва – това имаме с това работим.

Моето субективно мнение е, че инфлацията гони 20% и това няма да е краят. Затова няма смисъл да държите пари в брой. Моя съвет е оставете си някакъв кеш като за покриване на 6-7 месечни разходи и останалото о вложете някъде. Ако няма друго и имота е вариант, СТИГА ДА НЕ СЕ „ИЗЦЕПИТЕ” С размера на КРЕДИТА!

В какъв размер да е той, вече е тема на друг разговор и е доста индивидуално понятие (според доходите и професията), но при всички положения, спестяванията е добре да бъдат в някакъв актив.

Инфлацията е по-висока от лихвата

Докато инфлацията е по-висока от лихвата по кредита, вие сте на „далавера”. В момента лихвите са под 3% по жилищите кредити. Има обаче дин проблем – Защитени ли са доходите ви?
Нека не забравяме, че при инфлация, някои фирми съкращават персонал, а потреблението се свива. И тук идва специфичната самопреценка – Вие ценан кадър ли сте и бизнесът в който работите вияе ли се от инфлацията?

Най-неприятно е когато работиш за чужда компания и тя реши да съкращава персонал. Това винаги се случва в най-малката и отдалечена икономика, каквато сме ние.
Естествено западните компании в момента са предпочитано място за работа, но не винаги е било така. Особено във времена на криза. Бил съм свидетел на закриване на бизнеси буквално за една нощ.

Частните бизнеси с кредити са най-затрашените

Когато имаш бизнес и теглиш фирмен кредит, банките винаги изискват собственика да стане поръчител на фирмата. Това определено е доста рисково и ако бизнесът закъса, нямате много време за реакция. Не сме много хората, коитио сме в състояние да помогнем в такъв момент.

Изобщо при лош кредит трябва да се действа много бързо, ако не искате да се окажете със запорирани сметки и възбранено имущество.

За срещи и консултации по банкови неволи, моля използвайте посочената форма.


Но да се върнем на имота като опция по време на инфлация. Да, добър вариант е, но не трябва да се прекалява. Трябва да съобразите доста фактори – цена, размер на кредита, професия, доходи, % финансиране, риск който поемате… Оказва се за пореден път, че недвижимите имоти може и да спасят спестяванията на българина. Трябва да се внимава обаче!

Васил Кендов – финансов консултант

Ако решите, че тази статия Ви е била полезна, моля споделете я във Фейсбук и се абинирайте за канала в Youtube

The post Инфлация и покупка на недвижим имот – спасяваме ли спестяванията си? appeared first on

Secure data movement across Amazon S3 and Amazon Redshift using role chaining and ASSUMEROLE

Data lakes use a ring of purpose-built data services around a central data lake. Data needs to move between these services and data stores easily and securely. The following are some examples of such services:

  • Amazon Simple Storage Service (Amazon S3), which stores structured, unstructured, and semi-structured data
  • Amazon Redshift, a fully managed, petabyte-scale data warehouse product to analyze large-scale structured and semi-structured data across data warehouses and operational databases
  • Amazon SageMaker, which consumes data for machine learning (ML) capabilities

In multi-tenant architectures, groups or users within a group may require exclusive permissions to the group’s S3 bucket and also the schema and tables belonging to Amazon Redshift. These teams also need to be able to control loading and unloading of data between the team-owned S3 buckets and Amazon Redshift schemas. Additionally, individual users within the team may require fine-grained control over objects in S3 buckets and specific schemas in Amazon Redshift. Implementing this permissions control use case should be scalable as more teams and users are onboarded and permission-separation requirements evolve.

Amazon Redshift and Amazon S3 provide a unified, natively integrated storage layer for data lakes. You can move data between Amazon Redshift and Amazon S3 using the Amazon Redshift COPY and UNLOAD commands.

This post presents an approach that you can apply at scale to achieve fine-grained access controls to resources in S3 buckets and Amazon Redshift schemas for tenants, including groups of users belonging to the same business unit down to the individual user level. This solution provides tenant isolation and data security. In this approach, we use the bridge model to store data and control access for each tenant at the individual schema level in the same Amazon Redshift database. We utilize ASSUMEROLE and role chaining to provide fine-grained access control when data is being copied and unloaded between Amazon Redshift and Amazon S3, so the data flows within each tenant’s namespace. Role chaining also streamlines the new tenant onboarding process.

Solution overview

In this post, we explore how to achieve resource isolation, data security, scaling to multiple tenants, and fine-grained access control at the individual user level for teams that access, store, and move data across storage using Amazon S3 and Amazon Redshift.

We use the bridge model to store data and control access for each tenant at the individual schema level in the same Amazon Redshift database. In the bridge model, a separate database schema is created for each tenant, and data for each tenant is stored in its own schema. The tenant has access only to its own schema.

We use the COPY and UNLOAD commands to load and unload data into the Amazon Redshift cluster using an S3 bucket. These commands require Amazon Redshift to access Amazon S3 on your behalf, and security credentials are provided to your clusters.

We create an AWS Identity and Access Management (IAM) role—we call it the Amazon Redshift onboarding role—and associate it with the Amazon Redshift cluster. For each tenant, we create a tenant-specific IAM role—we call it the tenant role—to define the fine-grained access to its own Amazon S3 resources. The Amazon Redshift onboarding role doesn’t have any permissions granted except allowing sts:AssumeRole to the tenant roles. The trust relationship to the Amazon Redshift onboarding role is defined in each of the tenant roles. We use the Amazon Redshift ASSUMEROLE privilege to control IAM role access privileges for database users and groups on COPY and UNLOAD commands.

Each tenant database user or group is granted ASSUMEROLE on the Amazon Redshift onboarding role and its own tenant role, which restricts the tenant to access its own Amazon S3 resources when using COPY and UNLOAD commands. We use role chaining when ASSUMEROLE is granted. This means that the tenant role isn’t required to be attached to the Amazon Redshift cluster, and the only IAM role associated is the Amazon Redshift onboarding role. Role chaining streamlines the new tenant onboarding process. With role chaining, we don’t need to modify the cluster; we can make modifications on the tenant IAM role definition when onboarding a new tenant.

For our use case, we have two tenants: team 1 and team 2. A tenant here represents a group of users—a team from the same business unit. We want separate S3 buckets and Amazon Redshift schemas for each team. These teams should be able to access their own data only and also be able to support fine-grained access control over copying and unloading data from Amazon S3 to Amazon Redshift (and vice versa) within the team boundary. We can apply access control at the individual user level using the same approach.

The following architecture diagram shows the AWS resources and process flow of our solution.

In this tutorial, you create two S3 buckets, two Amazon Redshift tenant schemas, two Amazon Redshift tenant groups, one Amazon Redshift onboarding role, and two tenant roles. Then you grant ASSUMEROLE on the onboarding and tenant role to each tenant, using role chaining. To verify that each tenant can only access its own S3 resources, you create two Amazon Redshift users assigned to their own tenant group and run COPY and UNLOAD commands.


To follow along with this solution, you need the following prerequisites:

Download the source code to your local environment

To implement this solution in your local development environment, you can download the source code from the GitHub repo or clone the source code using the following command:

git clone

The following files are included in the source code:

  • – A CloudFormation template to deploy the Amazon Redshift onboarding role redshift-onboarding-role
  • – A CloudFormation template to deploy an S3 bucket, KMS key, and IAM role for each tenant you want to onboard

Provision an IAM role for Amazon Redshift and attach this role to the Amazon Redshift cluster

Deploy the template using the AWS CloudFormation console or the AWS Command Line Interface (AWS CLI). For more information about stack creation, see Create the stack. This template doesn’t have any required parameters. The stack provisions an IAM role named redshift-s3-onboarding-role for Amazon Redshift. The following code is the policy defining sts:AssumeRole to the tenant-specific IAM roles:

  "Version": "2012-10-17",
  "Statement": [
   "Action": [
   "Resource": [
   "Effect": "Allow"

Navigate to the Amazon Redshift console and select the cluster you want to update. On the Actions menu, choose Manage IAM roles. Choose the role redshift-s3-onboarding-role to associate with the cluster. For more information, see Associate the IAM role with your cluster.

Provision the IAM role and resources for tenants

Deploy the template using the AWS CloudFormation console or the AWS CLI. For this post, you deploy the stack twice, supplying two unique tenant names for TenantName. For example, you can use team1 and team2 as the TenantName parameter values.

For each tenant, the stack provisions the following resources:

  • A KMS key
  • An S3 bucket named team1-data-<account id>-<region> with default encryption enabled with SSE-KMS using the created key
  • An IAM role named team1-tenant-redshift-s3-access-role

The policy attached to the role team1-tenant-redshift-s3-access-role can only access the team’s own S3 bucket. The role redshift-s3-onboarding-role is trusted to assume all tenant roles *-tenant-redshift-s3-access-role to enable role chaining. The tenant role *-tenant-redshift-s3-access-role has a trust relationship to redshift-s3-onboarding-role. See the following policy code:

            "Action": [
            "Resource": [
                "arn:aws:s3:::team1-data-<account id>-<region>/*",
                "arn:aws:s3:::team1-data-<account id>-<region>"
            "Effect": "Allow"

Create a tenant schema and tenant user with appropriate privileges

For this post, you create the following Amazon Redshift database objects using the query editor on the Amazon Redshift console or a SQL client tool like SQL Workbench/J. Replace <password> with your password and <account id> with your AWS account ID before running the following SQL statements:

create schema team1;
create schema team2;

create group team1_grp;
create group team2_grp;

create user team1_usr with password '<password>' in group team1_grp;
create user team2_usr with password '<password>' in group team2_grp;

grant usage on schema team1 to group team1_grp;
grant usage on schema team2 to group team2_grp;

GRANT ALL ON SCHEMA team1 TO group team1_grp;
GRANT ALL ON SCHEMA team2 TO group team2_grp;

revoke assumerole on all from public for all;

grant assumerole
on 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account-id>:role/team1-tenant-redshift-s3-access-role'
to group team1_grp for copy;

grant assumerole
on 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account id>:role/team1-tenant-redshift-s3-access-role'
to group team1_grp for unload;

grant assumerole
on 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account id>:role/team2-tenant-redshift-s3-access-role'
to group team2_grp for copy;

grant assumerole
on 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account id>:role/team2-tenant-redshift-s3-access-role'
to group team2_grp for unload;


Verify that each tenant can only access its own resources

To verify your access control settings, you can create a test table in each tenant schema and upload a file to the tenant’s S3 bucket using the following commands. You can use the Amazon Redshift query editor or a SQL client tool.

  1. Sign in as team1_usr and enter the following commands:

  2. Sign in as team2_usr and enter the following commands:

  3. Create a file named test-venue.txt with the following contents:
    7|BMO Field|Toronto|ON|0
    16|TD Garden|Boston|MA|0
    23|The Palace of Auburn Hills|Auburn Hills|MI|0
    28|American Airlines Arena|Miami|FL|0
    37|Staples Center|Los Angeles|CA|0
    52|PNC Arena|Raleigh|NC|0
    59|Scotiabank Saddledome|Calgary|AB|0
    66|SAP Center|San Jose|CA|0
    73|Heinz Field|Pittsburgh|PA|65050

  4. Upload this file to both team1 and team2 S3 buckets.
  5. Sign in as team1_usr and enter the following commands to test Amazon Redshift COPY and UNLOAD:
    copy team1.team1_venue
    from 's3://team1-data-<account id>-<region>/'
    iam_role 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account id>:role/team1-tenant-redshift-s3-access-role'
    delimiter '|' ;
    unload ('select * from team1.team1_venue')
    to 's3://team1-data-<account id>-<region>/unload/' 
    iam_role 'arn:aws:iam::<account id>:role/redshift-s3-onboarding-role,arn:aws:iam::<account id>:role/team1-tenant-redshift-s3-access-role';

The file test-venue.txt uploaded to the team1 bucket is copied to the table team1_venue in the team1 schema, and the data in table team1_venue is unloaded to the team1 bucket successfully.

  1. Replace team1 with team2 in the preceding commands and then run them again, this time signed in as team2_usr.

If you’re signed in as team1_usr and try to access the team2 S3 bucket or team2 schema or table and team2 IAM role when running COPY and UNLOAD, you get an access denied error. You get the same error if trying to access team1 resources while logged in as team2_usr.

Clean up

To clean up the resources you created, delete the CloudFormation stack created in your AWS account.


In this post, we presented a solution to achieve role-based secure data movement between Amazon S3 and Amazon Redshift. This approach combines with the ASSUMEROLE feature in Amazon Redshift to allow fine-grained access control over the COPY and UNLOAD commands down to the individual user level within a particular team. This in turn provides finer control over resource isolation and data security in a multi-tenant solution. Many use cases can benefit from this solution as more enterprises build data platforms to provide the foundations for highly scalable, customizable, and secure data consumption models.

About the Authors

Sudipta Mitra is a Senior Data Architect for AWS, and passionate about helping customers to build modern data analytics applications by making innovative use of latest AWS services and their constantly evolving features. A pragmatic architect who works backwards from customer needs, making them comfortable with the proposed solution, helping achieve tangible business outcomes. His main areas of work are Data Mesh, Data Lake, Knowledge Graph, Data Security and Data Governance.

Michelle Deng is a Sr. Data Architect at Amazon Web Services. She works with AWS customers to provide guidance and technical assistance about database migrations and Big data projects.

Jared Cook is a Sr. Cloud Infrastructure Architect at Amazon Web Services. He is committed to driving business outcomes in the cloud, and uses Infrastructure as Code and DevOps best practices to build resilient architectures on AWS.  In his leisure time, Jared enjoys the outdoors, music, and plays the drums.

Lisa Matacotta is a Senior Customer Solutions Manager at Amazon Web Services. She works with AWS customers to help customers achieve business and strategic goals, understand their biggest challenges and provide guidance based on AWS best practices to overcome them.

What’s the Diff: File-level vs. Block-level Incremental Backups

If you’ve stumbled upon this blog, chances are you already know that you need to be backing up your data to protect your home or business. Maybe you’re a hobbyist with over 1,000 digital movies in your collection and you lie awake at night, worrying about what would happen if your toddler spills juice on your NAS (let’s face it, toddlers are data disasters waiting to happen). Or you’re a media and entertainment professional worried about keeping archives of your past projects on an on-premises device. Or maybe that tornado that hit your area last week caused you to think twice about keeping all of your data on-premises.

Whether you have a background in IT or not, the many different configuration options for your backup software and cloud storage can be confusing. Today, we’re hoping to clear up one common question when it comes to backup strategies—understanding the difference between file-level and block-level incremental backups.

Refresher: Full vs. Incremental Backups

First things first, let’s define what we’re dealing with: the difference between full and incremental backups. The first step in any backup plan is to perform a full backup of your data. Plan to do this on a slow day because it can take a long time and hog a lot of bandwidth. Of course, if you’re a Backblaze customer, you can also use the Backblaze Fireball to get your data into Backblaze B2 Cloud Storage without taking up precious internet resources.

You should plan on regularly performing full backups because it’s always a good idea to have a fresh, full copy of your entire data set. Some people perform full backups weekly, some might do them monthly or even less often; it’s up to you as you plan your backup strategy.

Then, typically, incremental backups are performed in between your full backups. Want to know more about the difference between full and incremental backups and the considerations for each? Check out our recent blog post on the different types of backups.

What’s the Diff: File-level vs. Block-level Incremental Backups

Let’s take it to the next level. Incremental backups back up what has been changed or added since your last full backup. Within the category of incremental backups, there are two standard options: file-level and block-level incremental backups. Many backup tools and devices, like network attached storage (NAS) devices, offer these options in the configuration settings, so it’s important to understand the difference. After you decide which type of incremental backup is best for you, check your backup software or device’s support articles to see if you can configure this setting for yourself.

File-level Incremental Backups

When a file-level incremental backup is performed and a file has been modified, the entire file is copied to your backup repository. This takes longer than performing a block-level backup because your backup software will scan all your files to see which ones have changed since the last full backup and will then back up the entire modified file again.

Imagine that you have a really big file and you make one small change to that file; with file-level backups, the whole file is re-uploaded. This likely sounds pretty inefficient, but there are some advantages to a file-level backup:

  • It’s simple and straightforward.
  • It allows you to pick and choose the files you want backed up.
  • You can include or exclude certain file types or easily back up specific directories.

File-level backups might be the right choice for you if you’re a home techie who wants to back up their movie collection, knowing that those files are not likely to change. Or it could be a good fit for a small business with a small amount of data that isn’t frequently modified.

The diagram below illustrates this concept. This person performs their full backup on Sundays and Wednesdays. (To be clear, we’re not recommending this cadence—it’s just for demonstration purposes.) This results in a 100% copy of their data to a backup repository like Backblaze B2 Cloud Storage. On Monday, part of a file is changed (the black triangle) and a new file is added (the red square). The file-level incremental backup uploads the new file (the red square) and the entire file that has changed (the grey square with the black triangle). On Tuesday, another file is changed (the purple triangle). When the file-level incremental backup is performed, it adds the entire file (the grey square with the purple triangle) to the backup repository. On Wednesday, a new full backup is run, which creates a complete copy of the source data (including all your previously changed and added data) and stores that in the cloud. This starts the cycle of full backups to incremental backups over again.

Click to expand.

Block-level Incremental Backups

Block-level incremental backups do not copy the entire file if only a portion of it has changed. With this option, only the changed part of the file is sent to the backup repository. Because of this, block-level backups are faster and require less storage space. If you’re backing up to cloud storage, obviously this will help you save on storage costs.

Let’s return to our scenario where full backups are performed on Sundays and Wednesdays, but this time, block-level incrementals are being run in between. When the first block-level incremental backup is run on Monday, the backup software copies just the changed piece of data in the file (the black triangle) and the new data (the red square). In the Tuesday backup, the additional modified data in another file (the purple triangle) is also added to the backup repository. On Wednesday, the new full backup results in a fresh copy of the full data set to the cloud.

Click to expand.

Block-level incremental backups take a snapshot of the running volume and data is read from the snapshot. This allows files to be copied even if they’re currently in use in a running software program, and it also reduces the impact on your machine’s performance while the backup is running.

This backup type works better than file-level incremental backups when you have a large number of files or files that often change. If you don’t need to pick and choose which files to specifically include or exclude in your backup, it’s generally best to use block-level incremental backups, as they’re more efficient.

The only drawbacks to block-level incremental backups are that recovery may take longer, since your backup software will need to recover each piece of modified data and rebuild the file. And, because this style of incremental backup uploads modified data in pieces and parts, if one of those pieces becomes corrupted or is unable to be recovered, it could affect your ability to recover the whole file. For this reason (and plenty of other good reasons), it’s important to regularly include full backups in your backup strategy and not just count on incremental backups perpetually.

Ready to Get Started?

No matter which method of incremental backup you decide is right for you, you can take advantage of Backblaze’s extremely affordable B2 Cloud Storage at just $5/TB/month. Back up your servers or your NAS in a matter of minutes and enjoy the peace of mind that comes with knowing you’re protected from a data disaster.

The post What's the Diff: File-level vs. Block-level Incremental Backups appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Microsoft Issues Report of Russian Cyberattacks against Ukraine

Microsoft has a comprehensive report on the dozens of cyberattacks — and even more espionage operations — Russia has conducted against Ukraine as part of this war:

At least six Russian Advanced Persistent Threat (APT) actors and other unattributed threats, have conducted destructive attacks, espionage operations, or both, while Russian military forces attack the country by land, air, and sea. It is unclear whether computer network operators and physical forces are just independently pursuing a common set of priorities or actively coordinating. However, collectively, the cyber and kinetic actions work to disrupt or degrade Ukrainian government and military functions and undermine the public’s trust in those same institutions.


Threat groups with known or suspected ties to the GRU have continuously developed and used destructive wiper malware or similarly destructive tools on targeted Ukrainian networks at a pace of two to three incidents a week since the eve of invasion. From February 23 to April 8, we saw evidence of nearly 40 discrete destructive attacks that permanently destroyed files in hundreds of systems across dozens of organizations in Ukraine.

[$] Printbuf rebuffed for now

There is a long and growing list of options for getting information out of
the kernel but, in the real world, print statements still tend to be the
tool of choice. The kernel’s printk()
often comes up short, despite the fact that it provides a set of
kernel-specific features, so there has, for some time, been interest in
better APIs for textual output from the kernel. The “printbuf”
from Kent Overstreet is one step in that direction, but will
need some changes to make it work well with features the kernel already

Secret Management with HashiCorp Vault

Secret Management with HashiCorp Vault

Secret Management with HashiCorp Vault

Many applications these days require authentication to external systems with resources, such as users and passwords to access databases and service accounts to access cloud services, and so on. In such cases, private information, like passwords and keys, becomes necessary. It is essential to take extra care in managing such sensitive data. For example, if you write your AWS key information or password in a script for deployment and then push it to a Git repository, all users who can read it will also be able to access it, and you could be in trouble. Even if it’s an internal repository, you run the risk of a potential leak.

How we were managing secrets in the service

Before we talk about Vault, let’s take a look at how we’ve used to manage secrets.


We use SaltStack as a bare-metal configuration management tool. The core of the Salt ecosystem consists of two major components: the Salt Master and the Salt Minion. The configuration state is owned by Salt Master, and thousands of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the state. The state may contain secrets, such as passwords and API keys. When we deploy secrets to the node, we encrypt plaintext using a Salt Master owned GPG key and fill an ASCII-armored secret into the state file. Once it is applied, the Salt Master decrypts the PGP message using its own key, then the Salt Minion retrieves rendered data from the Master.

Secret Management with HashiCorp Vault


We were using Lockbox, a secure way to store your Kubernetes secrets offline. The secret is asymmetrically encrypted and can only be decrypted with the Lockbox Kubernetes controller. The controller synchronizes with Secret objects. A Secret generated from Lockbox will also be created in the corresponding namespace. Since namespaces have been assigned administrator privileges by each engineering team, ordinary users cannot read Secret objects.

Secret Management with HashiCorp Vault

Why these secrets management were insufficient

Prior to Vault, GnuPG and Lockbox were used in this way to encrypt and decrypt most secrets in the data center. Nevertheless, they were inadequate in certain cases:

  • Lack of scoping secrets: The secret data in ASCII-armor could only be decrypted by a specific node when the client read it. This was still not enough control. Salt owns a GPG key for each Salt Master, and Core services (k8s, Databases, Storage, Logging, Tracing, Monitoring, etc) are deployed to hundreds of Salt Minions by a few Salt Masters. Nodes are often reused as different services after repairing hardware failure, so we use the same GPG key to decrypt the secrets of various services. Therefore, having a GPG key for each service is complicated. Also, a specific secret is used only for a specific service. For example, an access key for object storage is needed to back up the repository. In previous configurations, the API key is decrypted by a common Salt Master, so there is a risk that the API key will be referenced by another service or for another purpose. It is impossible to scope secret access, as long as we use the same GPG key.

    Another case is Kubernetes. Namespace-scoped access control and API access restrictions by the RBAC model are excellent. And the etcd used by Kubernetes as storage is not encrypted by default, and the Secret object is also saved. We need to think about encryption-at-rest by a third party KMS, or how to prevent Secrets from being stored in etcd. In other words, it is also required to properly control access to the secret for the secret itself.

  • Rotation and static secret: Anyone who has access to the Salt Master GPG key can theoretically decrypt all current and future secrets. And as long as we have many secrets, it’s impossible to rotate the encryption of all the secrets. Current and future secret management requires a process for easy rotation and using dynamically generated secrets instead.
  • Session management: Users/Services with GPG keys can decrypt secrets at any time. So GPG secret decryption is like having no TTL. (You can set an expiration date against the GPG key, but it’s just metadata. If you try to encrypt a new secret, after the expiration date, you’ll get a warning, but you can decrypt the existing secret). A temporary session is required to limit access when not needed.
  • Audit: GPG doesn’t have a way to keep an audit trail. Audit trails help us to trace the event who/when/where read secrets. The audit trail should contain details including the date, time, and user information associated with the secret read (and login), which is required regardless of user or service.

HashiCorp Vault

Armed with our set of requirements, we chose HashiCorp Vault to make better secret management with a better security model.

  • Scoping secrets: When a client logs in, a Vault token is generated through the Auth method (backend). This token has a policy that defines access policies, so it is clear what the client can access the data after logging in.
  • Rotation and dynamic secret: Version-controlled static secret with KV V2 Secret Engine helps us to easily update/rollback secrets with a single request. In addition, dynamic secrets and credentials are available to eliminate manual rotation. Ideally, these are required to be short-lived and have frequent rotation. Service should have restricted access. These are essential to reduce the impact of an attack, but they are operationally difficult, and it is impossible to satisfy them without automation. Vault can solve this problem by allowing operators to provide dynamically generated credentials to their services. Vault manages the credential lifecycle and rotates and revokes it as needed.
  • Session management: Vault provides a login process to get the token and various auth methods are provided. It is possible to link with an Identity Provider and authenticate using JWT. Since the vault token has a TTL, it can be managed as a short-lived credential to access secrets.
  • Audit: Vault supports audit that records who accessed which Vault API, when, and from where.

We also built Vault clusters for HA, Reliability, and handling large numbers of requests.

  • Use Integrated Storage that every node in the Vault cluster has a duplicate copy of Vault’s data. A client can retrieve the same result from any node.
  • Performance Replication offers us the same result as any Vault clusters.
  • Requests from clients are routed from a single Service IP to one of the Clusters. Anycast routes incoming traffic to the nearest cluster that handles requests efficiently. If one cluster goes down, the request will be automatically routed to another available cluster.
Secret Management with HashiCorp Vault

Service integrations

Use the appropriate Auth backend and Secret Engine to integrate the Service and Vault that are responsible for each core component.


The configuration state is owned by Salt Master, and hundreds of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the role. The state data may contain secrets, such as API keys, and Salt Minion retrieves them from Vault. Salt uses a JWT signed by the Salt Master to log in to the vault using the JWT Auth method.

Secret Management with HashiCorp Vault


Kubernetes reads Vault secrets through an operator that synchronizes with Secret objects. The Kubernetes Auth method uses the Service Account token JWT to login, just like the JWT Auth method. This JWT contains the service account name, UID, and namespace. Vault can scope namespace based on dynamic policy.

Secret Management with HashiCorp Vault

Identity Provider – User login

Additionally, Vault can work with the Identity Provider through a delegated authorization method based on OAuth 2.0 so that users can get tokens with the right policies. The JWT issued by the Identity Provider contains the group or user ID to which it belongs, and this metadata can be used to assign a Vault policy.

Secret Management with HashiCorp Vault

Integrated ecosystem – Auth x Secret

Vault provides a plugin system for two major components: authentication (Auth method) and secret management (Secret Engine). Vault can enable the officially provided plugins and the custom plugins you can build. The Auth method provides authentication for obtaining a Vault token by various methods. As mentioned in the service integration example above, we mainly use JWT, OIDC, and Kubernetes for login. On the other hand, the secret engine provides secrets in various ways, such as KV for a static secret, PKI for certificate signing, issuing, etc.

And they have an ecosystem. Vault can easily integrate auth methods and secret engines with each other. For instance, if we add a DB dynamic credential secret engine, all existing platforms will instantly be supported, without needing to reinvent the wheel, on how they will auth to a separate service. Similarly, we can add a platform into the mix, and it would instantly have access to all the existing secret engines and their functionalities. Additionally, the Vault can perform permission to the arbitrary endpoint path provided by secret engines based on the authentication method and policies.

Wrap up

Vault integration for the core component is already ongoing and many GPG secrets have been migrated to Vault. We aim to make service integrations in our data centers, dynamic credentials, and improve CI/CD for Vault. Interested? We’re hiring for security platform engineering!

Security updates for Thursday

Post Syndicated from original

Security updates have been issued by Debian (chromium, golang-1.7, and golang-1.8), Fedora (bettercap, chisel, containerd, doctl, gobuster, golang-contrib-opencensus-resource, golang-github-appc-docker2aci, golang-github-appc-spec, golang-github-containerd-continuity, golang-github-containerd-stargz-snapshotter, golang-github-coredns-corefile-migration, golang-github-envoyproxy-protoc-gen-validate, golang-github-francoispqt-gojay, golang-github-gogo-googleapis, golang-github-gohugoio-testmodbuilder, golang-github-google-containerregistry, golang-github-google-slothfs, golang-github-googleapis-gnostic, golang-github-googlecloudplatform-cloudsql-proxy, golang-github-grpc-ecosystem-gateway-2, golang-github-haproxytech-client-native, golang-github-haproxytech-dataplaneapi, golang-github-instrumenta-kubeval, golang-github-intel-goresctrl, golang-github-oklog, golang-github-pact-foundation, golang-github-prometheus, golang-github-prometheus-alertmanager, golang-github-prometheus-node-exporter, golang-github-prometheus-tsdb, golang-github-redteampentesting-monsoon, golang-github-spf13-cobra, golang-github-xordataexchange-crypt, golang-gopkg-src-d-git-4, golang-k8s-apiextensions-apiserver, golang-k8s-code-generator, golang-k8s-kube-aggregator, golang-k8s-sample-apiserver, golang-k8s-sample-controller, golang-mongodb-mongo-driver, golang-storj-drpc, golang-x-perf, gopass, grpcurl, onionscan, shellz, shhgit, snowcrash, stb, thunderbird, and xq), Oracle (gzip, kernel, and polkit), Slackware (curl), SUSE (buildah, cifs-utils, firewalld, golang-github-prometheus-prometheus, libaom, and webkit2gtk3), and Ubuntu (nginx and thunderbird).

Graph Networks – Striking fraud syndicates in the dark

Post Syndicated from Grab Tech original

As a leading superapp in Southeast Asia, Grab serves millions of consumers daily. This naturally makes us a target for fraudsters and to enhance our defences, the Integrity team at Grab has launched several hyper-scaled services, such as the Griffin real-time rule engine and Advanced Feature Engineering. These systems enable data scientists and risk analysts to develop real-time scoring, and take fraudsters out of our ecosystems.

Apart from individual fraudsters, we have also observed the fast evolution of the dark side over time. We have had to evolve our defences to deal with professional syndicates that use advanced equipment such as device farms and GPS spoofing apps to perform fraud at scale. These professional fraudsters are able to camouflage themselves as normal users, making it significantly harder to identify them with rule-based detection.

Since 2020, Grab’s Integrity team has been advancing fraud detection with more sophisticated techniques and experimenting with a range of graph network technologies such as graph visualisations, graph neural networks and graph analytics. We’ve seen a lot of progress in this journey and will be sharing some key learnings that might help other teams who are facing similar issues.

What are Graph-based Prediction Platforms?

“You can fool some of the people all of the time, and all of the people some of the time, but you cannot fool all of the people all of the time.” – Abraham Lincoln

A Graph-based Prediction Platform connects multiple entities through one or more common features. When such entities are viewed as a macro graph network, we uncover new patterns that are otherwise unseen to the naked eye. For example, when investigating if two users are sharing IP addresses or devices, we might not be able to tell if they are fraudulent or just family members sharing a device.

However, if we use a graph system and look at all users sharing this device or IP address, it could show us if these two users are part of a much larger syndicate network in a device farming operation. In operations like these, we may see up to hundreds of other fake accounts that were specifically created for promo and payment fraud. With graphs, we can identify fraudulent activity more easily.

Grab’s Graph-based Prediction Platform

Leveraging the power of graphs, the team has primarily built two types of systems:

  • Graph Database Platform: An ultra-scalable storage system with over one billion nodes that powers:
    1. Graph Visualisation: Risk specialists and data analysts can review user connections real-time and are able to quickly capture new fraud patterns with over 10 dimensions of features (see Fig 1).

      Change Data Capture flow
      Fig 1: Graph visualisation
    2. Network-based feature system: A configurable system for engineers to adjust machine learning features based on network connectivity, e.g. number of hops between two users, numbers of shared devices between two IP addresses.

  • Graph-based Machine Learning: Unlike traditional fraud detection models, Graph Neural Networks (GNN) are able to utilise the structural correlations on the graph and act as a sustainable foundation to combat many different kinds of fraud. The data science team has built large-scale GNN models for scenarios like anti-money laundering and fraud detection.

    Fig 2 shows a Money Laundering Network where hundreds of accounts coordinate the placement of funds, layering the illicit monies through a complex web of transactions making funds hard to trace, and consolidate funds into spending accounts.

Change Data Capture flow
Fig 2: Money Laundering Network

What’s next?

In the next article of our Graph Network blog series, we will dive deeper into how we develop the graph infrastructure and database using AWS Neptune. Stay tuned for the next part.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Handy Tips #28: Keeping track of your services with business service monitoring

Configure and deploy flexible business services and monitor the availability of your business and its individual components.

The availability of a business service tends to depend on the state of many interconnected components. Therefore, detecting the current state of a business service requires a sufficiently complex and flexible monitoring logic.

Define flexible business service trees and stay informed about the state of your business services:

  • Business services can depend on an unlimited number of underlying components
  • Select from multiple business service status propagation rules

  •  Calculate the business service state based on the weight of the business service components
  • Receive alerts whenever your business service is unavailable

Check out the video to learn how to configure business service monitoring.

How to configure business service monitoring:

  1. Navigate to Services – Services
  2. Click the Edit button and then click the Create service button
  3. For this example, we will define an Online store business service
  4. Name your service and mark the Advanced configuration checkbox
  5. Click the Add button under the Additional rules
  6. Set the service status and select the conditions
  7. For this example, we will set the status to High
  8. We will use the condition “If weight of child services with Warning status or above is at least 6
  9. Set the Status calculation rule to Set status to OK
  10. Press the Add button
  11. Press the Add child service button
  12. For this example, we will define Web server child services
  13. Provide a child service name and a problem tag
  14. For our example, we will use node name Equals node # tag
  15. Mark the Advanced configuration checkbox and assign the service weight
  16. Press the Add button
  17. Repeat steps 12 – 17 and add additional child services
  18. Simulate a problem on your services so the summary weight is >= 6
  19. Navigate to Services – Services and check the parent service state

Tips and best practices
  • Service actions can be defined in the Services → Service actions menu section
  • Service root cause problem can be displayed in notifications with the {SERVICE.ROOTCAUSE} macro
  • Service status will not be propagated to the parent service if the status propagation rule is set to Ignore this service
  • Service-level tags are used to identify a service. Service-level tags are not used to map problems to the service

The post Handy Tips #28: Keeping track of your services with business service monitoring appeared first on Zabbix Blog.

A cybersecurity club for girls | Hello World #18

In this article adapted from Hello World issue 18, teacher Babak Ebrahim explains how his school uses a cybersecurity club to increase interest in Computing among girls. Babak is a Computer Science and Mathematics teacher at Bishop Challoner Catholic College Secondary in Birmingham, UK. He is a CAS Community Leader, and works as a CS Champion for the National Centre for Computing Education in England.

Cybersecurity for girls

It is impossible to walk into an upper-secondary computer science lesson and not notice the number of boys compared to girls. This is a common issue across the world; it is clear from reading community forums and news headlines that there is a big gap in female representation in computing. To combat this problem in my school, I started organising trips to local universities and arranging assembly talks for my Year 9 students (aged 13–14). Although this was helpful, it didn’t have as much impact as I expected on improving female representation.

Girls do a cybersecurity activity at a school club.
Girls engage in a cryptography activity at the club.

This led me to alter our approach and target younger female students with an extracurricular club. As part of our lower-secondary curriculum, all pupils study encryption and cryptography, and we were keen to extend this interest beyond lesson time. I discovered the CyberFirst Girls Competition, aimed at Year 8 girls in England (aged 12–13) with the goal of influencing girls when choosing their GCSE subjects (qualifications pupils take aged 14–16). Each school can enter as many teams as they like, with a maximum of four girls in each team. I advertised the event by showing a video of the previous year’s attendees and the winning team. To our delight, 19 girls, in five teams, entered the competition.

Club activities at school

To make sure that this wasn’t a one-off event, we started an after-school cybersecurity club for girls. All Computing teachers encouraged their female students to attend. We had a number of female teachers who were teaching Maths and Computing as their second subjects, and I found it more effective when these teachers encouraged the girls to join. They would also help with running the club. We found it to be most popular with Year 7 students (aged 11–12), with 15 girls regularly attending. We often do cryptography tasks in the club, including activities from established competitions. For example, I recently challenged the club to complete tasks from the most recent Alan Turing Cryptography Competition. A huge benefit of completing these tasks in the club, rather than in the classroom, was that students could work more informally and were not under pressure to succeed. I found this year’s tasks quite challenging for younger students, and I was worried that this could put them off returning to the club. To avoid this, I first taught the students the skills that they would need for one of the challenges, followed by small tasks that I made myself over two or three sessions.

Three teenage girls at a laptop

For example, one task required students to use the Playfair cipher to break a long piece of code. In order to prepare students for decoding this text, I showed them how the cipher works, then created empty grids (5 x 5 tables) and modelled the technique with simple examples. The girls then worked in teams of two to encrypt a short quote. I gave each group a different quotation, and they weren’t allowed to let other groups know what it was. Once they applied the cipher, they handed the encrypted message to another group, whose job was to decrypt it. At this stage, some would identify that the other group had made mistakes using the techniques, and they would go through the text together to identify them. Once students were confident and competent in using this cipher, I presented them with the competition task, and they then applied the same process. Of course, some students would still make mistakes, but they would realise this and be able to work through them, rather than being overwhelmed by them. Another worthwhile activity in the club has been for older pupils, who are in their second year of attending, to mentor and support girls in the years below them, especially in preparation for participating in competitions.

Trips afield

Other club activities have included a trip to Bletchley Park. As a part of the package, students took part in a codebreaking workshop in which they used the Enigma machine to crack encrypted messages. This inspirational trip was a great experience for the girls, as they discovered the pivotal roles women had in breaking codes during the Second World War. If you’re not based in the UK, Bletchley Park also runs a virtual tour and workshops. You could also organise a day trip to a local university where students could attend different workshops run by female lecturers or university students; this could involve a mixture of maths, science, and computer science activities.

Girls do a cybersecurity activity at a school club.
Girls engage in a cryptography activity at the club.

We are thrilled to learn that one of our teams won this year’s CyberFirst Girls Competition! More importantly, the knowledge gained by all the students who attend the club is most heartening, along with the enthusiasm that is clearly evident each week, and the fun that is had. Whether this will have any impact on the number of girls who take GCSE Computer Science remains to be seen, but it certainly gives the girls the opportunity to discover their potential, learn the importance of cybersecurity, and consider pursuing a career in a male-dominated profession. There are many factors that influence a child’s mind as to what they would like to study or do, and every little extra effort that we put into their learning journey will shape who they will become in the future.

What next?

Find out more about teaching cybersecurity

Find out more about the factors influencing girls’ and young women’ engagement in Computing

  • We are currently completing a four-year programme of research about gender balance in computing. Find out more about this research programme.
  • At our research seminar series, we welcomed Peter Kemp and Billy Wong last year, who shared results from their study of the demographics of students who choose GCSE Computer Science in England. Watch the seminar recording.
  • Katharine Childs from our team had summarised the state of research about gender balance in computing. Watch her seminar, or read her report.
  • Last year, we hosted a panel session to learn from various perspectives on gender balance in computing. Watch the panel recording.

The post A cybersecurity club for girls | Hello World #18 appeared first on Raspberry Pi.

Русия, освободителкa наша

Русия пак ни освободи – без да иска, защото целта беше да сплаши. Спря доставките на газ за държава, зависима 90% от тях, свали евро-атлантическата дегизировка на президента Румен Радев и извади лидерското у Кирил Петков и Асен Василев (особено Василев). Случи се онова, което нито един български политик не посмя да извърши поради зависимости, раболепие пред Кремъл и лична изгода.

Без руски газ корупцията намалява

Москва прекрати газовите доставки за България в опит за политическа дестабилизация, след като правителството отказа да заплаща в рубли синьото гориво. Така най-зависимата от руски газ държава в ЕС бе наказана да купува суровина, чийто доставчик да не е „Газпром“, да търси партньорства със съседни европейски държави, да бърза да приключи проекти с над 10-годишна давност, какъвто е интерконекторът с Гърция. Тоест да прави всичко онова, което българските държавници досега трябваше да направят в името на националните интереси.

Както и за първата си свобода, донесена ѝ покрай битката за черноморските проливи, и за тази България ще плати скъпо. Окупационният дълг за Освобождението е изчислен на 89 640 000 злaтни лeвa (32 тона злaтo), от които Княжество България изплаща близо една трета. За окупацията след Втората световна война България плаща над 133 млрд. лв. – или над 300 милиона тогавашни долара. Според публикацията на „168 часа“ сумата за издръжката на съветските офицери и войници в периода 1944–1947 г. варира между 375 млн. и 1 млрд. лв. месечно – за сравнение, българският бюджет е бил около 42 млрд. лв. А освен българските архиви, Червената армия изнася в Русия и 164 завода.

За спрения руски природен газ българските данъкоплатци, най-бедните в ЕС, също ще платят скъпо. Колкото и Комисията за енергийно и водно регулиране да сдържа увеличението на синьото гориво, топлофикациите не може да продължат да купуват твърде скъп газ и да продават на клиентите си на много по-ниски цени. Така че алтернативни доставки означават по-скъпо парно, още по-скъп хляб, намалена конкурентоспособност на продукция, произвеждана от стъкларската, металургичната и торовата индустрия, нови затруднения за газифицираните общини да плащат още по-високи сметки за консумирано синьо гориво на детски и социални заведения, на местни управи. Добрата новина е, че консумацията в България е малка – 3–3,5 млрд. куб.м, които няма да е проблем да се осигурят.

А инфлацията ще подхвръкне още – никой не е посмял дори да изчисли с колко. Преди новината за спирането на доставките Международният валутен фонд прогнозира за България двуцифрена инфлация от 11% тази година и растеж, не по-висок от 3%, заради енергийната зависимост от Русия.

Ако България успее да се откаже от руския газ, то и корупцията силно ще намалее. Монополизмът на „Газпром“, крепен от българските правителства, бавенето на интерконекторите със съседните държави – това не се прави от любов към Русия. От времената на следосвобожденска България до наши дни – нищо ново.

… Ако руската дипломация, ако чиновническа Русия да не плащаше богато-богато и не поддържаше всичките вагабонти и предатели в България, то селото би било мирно. Що нещо пара, колко шиника рубли е предадено и платено на тия черни и мерзки души, като захванеш от Цанкова и свършиш с Кронослав Херуц! 

Захарий Стоянов, 1887 г.

Договорът с „Газпром“ изтича в края на 2022 г. и ако не се подпише нов, част от българския елит – политици, енергетици, анализатори, журналисти, инфлуенсъри – ще изгуби хранилките си. А за България ще е една зависимост по-малко. Остават петролът и доставките на свежо ядрено гориво, за които ще има нов търг през 2024 г.

И президентът пак вдигна юмрук

Руската агресия, спрените доставки на руски газ и евентуалното изпращане на оръжие в Украйна накараха президента отново да вдигне юмрук. Този път срещу „своите“, с които допреди година беше в един окоп – Кирил Петков и Асен Василев. Започна демаскирането на Радев, чийто рейтинг литна нагоре с протестите през лятото на 2020 г., свалянето на ГЕРБ от власт и осеммесечното управление на служебните му кабинети. Не че се е крил особено, но преди войната в Украйна прокремълският щемпел не личеше толкова.

Да, помнят се „Крим е руски, какъв да е!“ – реплика, хвърлена в момент, в който назряваше нахлуването в Украйна, настойчивостта българските МиГ-29 да продължат да се ремонтират в Русия, противопоставянето на руските санкции и на разполагането на натовски войски в България, а сега и съпротивата срещу (евентуално) изпращане на оръжие за Украйна. На 5 май ще се появи и новата партия от националконсерватори на бившия му съветник Стефан Янев, който споделя същите възгледи.

Анализатори и политици вече определиха спирането на газовите доставки от Русия не просто като инструмент за дестабилизация на България, а и като пореден опит за ерозия на европейското единство. Подкрепата за подобна политика – директна или индиректна, включително призиви и внушения да се плаща за газа в рубли – означава ни повече, ни по-малко диверсия срещу ЕС. Под претекст, че е загрижен за благосъстоянието и живота на българите, Радев направи тъкмо това, споменавайки и фалшивата новина за Австрия, че уж се съгласила да плаща в рубли за руския газ и така да гарантира своята сигурност.

„Правителството дължи категоричен отговор на гражданите, които му гласуваха доверие, чии интереси обслужва – техните или нечии други. Крайно време е правителството да даде ясни доказателства, че осъзнава и отстоява българския суверенитет и в своята политика се води от българския национален интерес“, каза Радев, преди да замине на официално посещение в Испания. Изявлението му предизвика коментари в социалните мрежи, приканващи го да направи същото.

За няколкото минути пред микрофоните на журналистите той критикува не само доскорошните си съратници, но и лидерката на БСП Корнелия Нинова, с която са в нескрита вражда от година. И Радев, и Нинова са срещу изпращането на оръжие на Украйна, а БСП дори не излъчи свой представител в делегацията в Киев, водена от премиера Петков. „Аз недоумявам как министърът на икономиката ще обясни на българите и на левите хора, които винаги са били срещу войните, че българското оръжие подхранва този конфликт“, заяви Радев.

Нинова не му остана длъжна:

… ще трябва Вие да обясните на тези, които два пъти Ви издигат и избират за президент, какви договори за износ на оръжие е подписало Вашето служебно правителство. Защото и сега разрешения за износ се издават въз основа на подписаните тогава. И много добре знаете, че дестинациите и тогава, и сега са същите – над 50 държави и нито една не е Украйна. 

Но тя също така поиска да обясни защо напада яростно „Продължаваме промяната“ – „отрочето, което създадохте, за да убиете БСП“, и откъде се появи този синхрон с Борисов.

Радев няма какво да губи, няма да се явява на избори, очаква го гарантиран 5-годишен мандат. Но всъщност изгуби – гласовете на онези млади хора, които излязоха на площада пред Президентството, за да защитят правовия ред, и които го подкрепиха за втори мандат. Може да се окаже, че онова лято на 2020 г. ще е върхът в политическата му кариера – и вече слиза по стълбата.

В същото време той помогна на „отрочетата“ си – без да иска, също като Русия. Никога досега от началото на своята политическа кариера Кирил Петков и Асен Василев не са проявявали по-голямо лидерство от сблъсъка си с президента.

Позицията, която г-н Радев изрази, че давайки оръжие, ние продължаваме конфликта, е позорна, защото в нея имплицитно стои разбирането, че Русия ще победи в този конфликт и че е нормално и добре Русия да победи. Аз смятам, че Украйна ще победи в този конфликт и ние трябва да ѝ помогнем. (…) Ние трябва да помогнем на Украйна, защото алтернативата е първо Украйна, а след това цяла Източна Европа да станат отново васални придатъци, което това правителство няма да допусне.

Това обяви вицепремиерът и финансов министър Асен Василев на пресконференцията в сряда. Сега остава да видим как Василев и Петков ще преминат от думи към дела.

Следващата седмица се очаква парламентът да гласува и подкрепи изпращането на оръжие на Украйна. Най-сетне премиерът заяви категорично, че „Продължаваме промяната“ ще подкрепи такова решение. „Демократична България“ съобщи, че ще внесе предложението на 4 май. По всичко личи, че то ще събере необходимата подкрепа – има заявка и от друг от партньорите в управляващата коалиция – „Има такъв народ“. Миналата седмица лидерът на ИТН Слави Трифонов заяви във Facebook, че „ако има червена линия, аз съм от страната на тези, които смятат, че Украйна трябва да бъде подпомогната по всякакъв начин – включително с оръжие“.

Кои остават от другата страна на червената линия – Радев, Янев, Нинова, Костадин Костадинов. И за тази демаркационна линия също трябва да благодарим на Русия.

Заглавна снимка: Стопкадър от видеоизлъчване на „Дневник“ от пресконференцията на Кирил Петков и Асен Василев на 27 април 2022 г.


TLS Beyond the Web: How MongoDB Uses Let’s Encrypt for Database-to-Application Security

Most of the time, people think about using Let’s Encrypt certificates to encrypt the communication between a website and server. But connections that need TLS are everywhere! In order for us to have an Internet that is 100% encrypted, we need to think beyond the website.

MongoDB’s managed multicloud database service, called Atlas, uses Let’s Encrypt certificates to secure the connection between customers’ applications and MongoDB databases, and between service points inside the platform. We spoke with Kenn White, Security Principal at MongoDB, about how his team uses Let’s Encrypt certificates for over two million databases, across 200 datacenters and three cloud providers.

"Let’s Encrypt has become a core part of our infrastructure stack," said Kenn. Interestingly, our relationship didn’t start out that way. MongoDB became a financial sponsor of Let’s Encrypt years earlier simply to support our mission to pursue security and privacy. MongoDB Atlas began to take off and it became clear that TLS would continue to be a priority as they brought on customers like currency exchanges, treasury platforms and retail payment networks. "The whole notion of high automation and no human touch all really appealed to us," said Kenn of MongoDB’s decision to use Let’s Encrypt.

MongoDB’s diverse customer roster means they support a wide variety of languages, libraries, and operating systems. Consequently, their monitoring is quite robust. Over the years, MongoDB has become a helpful resource for Let’s Encrypt engineers to identify edge case implementation bugs. Their ability to accurately identify issues early helps us respond efficiently; this is a benefit that ripples out across our diverse subscribers all over the Web.

The open sharing of information is a core part of how Let’s Encrypt operates. In fact, "transparency" is one of our key operating principles. The ability to see and understand how Let’s Encrypt is changing helped MongoDB gain trust and confidence in our operations. "I don’t think you can really put a price on the experience we’ve had working with the Let’s Encrypt engineering team," said Kenn. "One thing that I appreciate about Let’s Encrypt is that you’ve always been extremely transparent on your priorities and your roadmap vision. In terms of the technology and your telemetry, this is an evolution; where you are today is far better than where you were two years ago. And two years ago you were already head and shoulders above almost every peer in the industry."

Check out other blog posts in this series about how other large subscribers use Let’s Encrypt certificates.

TLS Simply and Automatically for Europe’s Largest Cloud Customers

Speed at scale: Let’s Encrypt serving Shopify’s 4.5 million domains

Supporting Let’s Encrypt

As a nonprofit project, 100% of our funding comes from contributions from our community of users and supporters. We depend on their support in order to provide our services for the public benefit. If your
company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. If you can support us with a donation, we ask that you make an individual contribution.

New IDC whitepaper released – Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation

A new International Data Corporation (IDC) whitepaper sponsored by AWS, Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation, examines the importance of the cloud in building the future of digital EU organizations. IDC predicts that 70% of CEOs of large European organizations will be incentivized to generate at least 40% of their revenues from digital by 2025, which means they have to accelerate their digital transformation. In a 2022 IDC survey of CEOs across Europe, 46% of European CEOs will accelerate the shift to cloud as their most strategic IT initiative in 2022.

In the whitepaper, IDC offers perspectives on how operational effectiveness, digital investment, and ultimately business growth need to be balanced with data sovereignty requirements. IDC defines data sovereignty as “a subset of digital sovereignty. It is the concept of data being subject to the laws and governance structures within the country it is collected or pertains to.”

IDC provides a perspective on some of the current discourse on cloud data sovereignty, including extraterritorial reach of foreign intelligence under national security laws, and the level of protection for individuals’ privacy in-country or with cross-border data transfer. The Schrems II decision and its implications with respect to personal data transfers between the EU and US has left many organizations grappling with how to comply with their legal requirements when transferring data outside the EU.

IDC provides the following background on controls in the cloud:

  • Cloud providers do not have unrestricted access to customer data in the cloud. Organizations retain all ownership and control of their data. Through credential and permission settings, the customer is the controller of who has access to their data.
  • Cloud providers use a rigorous set of organizational and technical controls based on least privilege to protect data from unauthorized access and inappropriate use.
  • Most cloud service operations, including maintenance and trouble-shooting, are fully automated. Should human access to customer data be required, it is temporary and limited to what is necessary to provide the contracted service to the customer. All access should be strictly logged, monitored, and audited to verify that activity is valid and compliant.
  • Technical controls such as encryption and key management assume greater importance. Encryption is considered fundamental to data protection best practices and highly recommended by regulators. Encrypted data processed in memory within hardware-based trusted execution environment (TEEs), also known as enclaves, can alleviate these regulatory concerns by rendering sensitive information invisible to host operating systems and cloud providers. The AWS Nitro System, the underlying platform that runs Amazon EC2 instances, is an industry example that provides such protection capability.
  • Independent accreditation against official standards are a recognized basis for assessing adherence to privacy and security practices. Approved by the European Data Protection Board, the EU Cloud Code of Conduct and CISPE’s Code of Conduct for Cloud Infrastructure Service Providers provide an accountability framework to help demonstrate compliance with processor obligations under GDPR Article 28. Whilst not required for GDPR compliance, CISPE requires accredited cloud providers to offer customers the option to retain all personal data in their customer content in the European Economic Area (EEA).
  • Greater data control and security is often cited as a driver to hosting data in-country. However, IDC notes that the physical location of the data has no bearing on mitigating data risk to cyber threats. Data residency can run counter to an organization’s objectives for security and resilience. More and more European organizations now are trusting the cloud for their security needs, as many organizations simply do not have the resource and expertise to provide the same security benefits as large cloud providers can.

For more information about how to translate your data sovereignty requirements into an actionable business and IT strategy, read the full IDC whitepaper Trusted Cloud: Overcoming the Tension Between Data Sovereignty and Accelerated Digital Transformation. You can also read more about AWS commitments to protect EU customers’ data on our EU data protection webpage.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.


Marta Taggart

Marta is a Seattle-native and Senior Product Marketing Manager in AWS Security Product Marketing, where she focuses on data protection services. Outside of work you’ll find her trying to convince Jack, her rescue dog, not to chase squirrels and crows (with limited success).

Orlando Scott-Cowley

Orlando Scott-Cowley

Orlando is Amazon Web Services’ Worldwide Public Sector Lead for Security & Compliance in EMEA. Orlando customers with their security and compliance and adopting AWS. Orlando specialises in Cyber Security, with a background in security consultancy, penetration testing and compliance; he holds a CISSP, CCSP and CCSK.

[$] The risks of embedded bare repositories in Git

Running code from inside a cloned Git repository is potentially risky, but
normally just inspecting such a repository is considered to be safe. As a
recent posting to the Git mailing list shows, however, there are still
risks lurking inside these repositories; code that lives in them can be
triggered in unexpected ways. In particular, malicious “bare” repositories
can be added as a subdirectory of a repository; they can be configured to run
code whenever Git commands are executed there, which is something that can
happen in surprising ways. There is now an effort
underway to try to address the problem in Git, without breaking the
legitimate need for including bare repositories into a Git tree.

New – Storage-Optimized Amazon EC2 Instances (I4i) Powered by Intel Xeon Scalable (Ice Lake) Processors

Over the years we have released multiple generations of storage-optimized Amazon Elastic Compute Cloud (Amazon EC2) instances including the HS1 (2012) , D2 (2015), I2 (2013) , I3 (2017), I3en (2019), D3/D3en (2020), and Im4gn/Is4gen (2021). These instances are used to host high-performance real-time relational databases, distributed file systems, data warehouses, key-value stores, and more.

New I4i Instances
Today I am happy to introduce the new I4i instances, powered by the latest generation Intel Xeon Scalable (Ice Lake) Processors with an all-core turbo frequency of 3.5 GHz.

The instances offer up to 30 TB of NVMe storage using AWS Nitro SSD devices that are custom-built by AWS, and are designed to minimize latency and maximize transactions per second (TPS) on workloads that need very fast access to medium-sized datasets on local storage. This includes transactional databases such as MySQL, Oracle DB, and Microsoft SQL Server, as well as NoSQL databases: MongoDB, Couchbase, Aerospike, Redis, and the like. They are also an ideal fit for workloads that can benefit from very high compute performance per TB of storage such as data analytics and search engines.

Here are the specs:

Instance Name vCPUs
Memory (DDR4) Local NVMe Storage
(AWS Nitro SSD)
Sequential Read Throughput
(128 KB Blocks)
i4i.large 2 16 GiB 468 GB 350 MB/s Up to 10 Gbps Up to 10 Gbps
i4i.xlarge 4 32 GiB 937 GB 700 MB/s Up to 10 Gbps Up to 10 Gbps
i4i.2xlarge 8 64 GiB 1,875 GB 1,400 MB/s Up to 10 Gbps Up to 12 Gbps
i4i.4xlarge 16 128 GiB 3,750 GB 2,800 MB/s Up to 10 Gbps Up to 25 Gbps
i4i.8xlarge 32 256 GiB 7,500 GB
(2 x 3,750 GB)
5,600 MB/s 10 Gbps 18.75 Gbps
i4i.16xlarge 64 512 GiB 15,000 GB
(4 x 3,750 GB)
11,200 MB/s 20 Gbps 37.5 Gbps
i4i.32xlarge 128 1024 GiB 30,000 GB
(8 x 3,750 GB)
22,400 MB/s 40 Gbps 75 Gbps

In comparison to the Xen-based I3 instances, the Nitro-powered I4i instances give you:

  • Up to 60% lower storage I/O latency, along with up to 75% lower storage I/O latency variability.
  • A new, larger instance size (i4i.32xlarge).
  • Up to 30% better compute price/performance.

The i4i.16xlarge and i4.32xlarge instances give you control over C-states, and the i4i.32xlarge instances support non-uniform memory access (NUMA). All of the instances support AVX-512, and use Intel Total Memory Encryption (TME) to deliver always-on memory encryption.

From Our Customers
AWS customers and AWS service teams have been putting these new instances to the test ahead of today’s launch. Here’s what they had to say:

Redis Enterprises powers mission-critical applications for over 8,000 organizations. According to Yiftach Shoolman (Co-Founder and CTO of Redis):

We are thrilled with the performance we are seeing from the Amazon EC2 I4i instances which use the new low latency AWS Nitro SSDs. Our testing shows I4i instances delivering an astonishing 2.9x higher query throughput than the previous generation I3 instances. We have also tested with various read and write mixes, and observed consistent and linearly scaling performance.

ScyllaDB is a high performance NoSQL database that can take advantage of high performance cloud computing instances.
Avi Kivity (Co-Founder and CTO of ScyllaDB) told us:

When we tested I4i instances, we observed up to 2.7x increase in throughput per vCPU compared to I3 instances for reads. With an even mix of reads and writes, we observed 2.2x higher throughput per vCPU, with a 40% reduction in average latency than I3 instances. We are excited for the incredible performance and value these new instances will enable for our customers going forward.

Amazon QuickSight is a business intelligence service. After testing,
Tracy Daugherty (General Manager, Amazon Quicksight) reported that:

I4i instances have demonstrated superior performance over previous generation I instances, with a 30% improvement across operations. We look forward to using I4i to further elevate performance for our customers.

Available Now

You can launch I4i instances today in the AWS US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions (with more to come) in On-Demand and Spot form. Savings Plans and Reserved Instances are available, as are Dedicated Instances and Dedicated Hosts.

In order to take advantage of the performance benefits of these new instances, be sure to use recent AMIs that include current ENA drivers and support for NVMe 1.4.

To learn more, visit the I4i instance home page.


Zero-Day Vulnerabilities Are on the Rise

Both Google and Mandiant are reporting a significant increase in the number of zero-day vulnerabilities reported in 2021.


2021 included the detection and disclosure of 58 in-the-wild 0-days, the most ever recorded since Project Zero began tracking in mid-2014. That’s more than double the previous maximum of 28 detected in 2015 and especially stark when you consider that there were only 25 detected in 2020. We’ve tracked publicly known in-the-wild 0-day exploits in this spreadsheet since mid-2014.

While we often talk about the number of 0-day exploits used in-the-wild, what we’re actually discussing is the number of 0-day exploits detected and disclosed as in-the-wild. And that leads into our first conclusion: we believe the large uptick in in-the-wild 0-days in 2021 is due to increased detection and disclosure of these 0-days, rather than simply increased usage of 0-day exploits.


In 2021, Mandiant Threat Intelligence identified 80 zero-days exploited in the wild, which is more than double the previous record volume in 2019. State-sponsored groups continue to be the primary actors exploiting zero-day vulnerabilities, led by Chinese groups. The proportion of financially motivated actors­ — particularly ransomware groups — ­deploying zero-day exploits also grew significantly, and nearly 1 in 3 identified actors exploiting zero-days in 2021 was financially motivated. Threat actors exploited zero-days in Microsoft, Apple, and Google products most frequently, likely reflecting the popularity of these vendors. The vast increase in zero-day exploitation in 2021, as well as the diversification of actors using them, expands the risk portfolio for organizations in nearly every industry sector and geography, particularly those that rely on these popular systems.

News article.

How to control access to AWS resources based on AWS account, OU, or organization

AWS Identity and Access Management (IAM) recently launched new condition keys to make it simpler to control access to your resources along your Amazon Web Services (AWS) organizational boundaries. AWS recommends that you set up multiple accounts as your workloads grow, and you can use multiple AWS accounts to isolate workloads or applications that have specific security requirements. By using the new conditions, aws:ResourceOrgID, aws:ResourceOrgPaths, and aws:ResourceAccount, you can define access controls based on an AWS resource’s organization, organizational unit (OU), or account. These conditions make it simpler to require that your principals (users and roles) can only access resources inside a specific boundary within your organization. You can combine the new conditions with other IAM capabilities to restrict access to and from AWS accounts that are not part of your organization.

This post will help you get started using the new condition keys. We’ll show the details of the new condition keys and walk through a detailed example based on the following scenario. We’ll also provide references and links to help you learn more about how to establish access control perimeters around your AWS accounts.

Consider a common scenario where you would like to prevent principals in your AWS organization from adding objects to Amazon Simple Storage Service (Amazon S3) buckets that don’t belong to your organization. To accomplish this, you can configure an IAM policy to deny access to S3 actions unless aws:ResourceOrgID matches your unique AWS organization ID. Because the policy references your entire organization, rather than individual S3 resources, you have a convenient way to maintain this security posture across any number of resources you control. The new conditions give you the tools to create a security baseline for your IAM principals and help you prevent unintended access to resources in accounts that you don’t control. You can attach this policy to an IAM principal to apply this rule to a single user or role, or use service control policies (SCPs) in AWS Organizations to apply the rule broadly across your AWS accounts. IAM principals that are subject to this policy will only be able to perform S3 actions on buckets and objects within your organization, regardless of their other permissions granted through IAM policies or S3 bucket policies.

New condition key details

You can use the aws:ResourceOrgID, aws:ResourceOrgPaths, and aws:ResourceAccount condition keys in IAM policies to place controls on the resources that your principals can access. The following table explains the new condition keys and what values these keys can take.

Condition key Description Operator Single/multi value Value
aws:ResourceOrgID AWS organization ID of the resource being accessed All string operators Single value key Any AWS organization ID
aws:ResourceOrgPaths Organization path of the resource being accessed All string operators Multi-value key Organization paths of AWS organization IDs and organizational unit IDs
aws:ResourceAccount AWS account ID of the resource being accessed All string operators Single value key Any AWS account ID

Note: Of the three keys, only aws:ResourceOrgPaths is a multi-value condition key, while aws:ResourceAccount and aws:ResourceOrgID are single-value keys. For information on how to use multi-value keys, see Creating a condition with multiple keys or values in the IAM documentation.

Resource owner keys compared to principal owner keys

The new IAM condition keys complement the existing principal condition keys aws:PrincipalAccount, aws:PrincipalOrgPaths, and aws:PrincipalOrgID. The principal condition keys help you define which AWS accounts, organizational units (OUs), and organizations are allowed to access your resources. For more information on the principal conditions, see Use IAM to share your AWS resources with groups of AWS accounts in AWS Organizations on the AWS Security Blog.

Using the principal and resource keys together helps you establish permission guardrails around your AWS principals and resources, and makes it simpler to keep your data inside the organization boundaries you define as you continue to scale. For example, you can define identity-based policies that prevent your IAM principals from accessing resources outside your organization (by using the aws:ResourceOrgID condition). Next, you can define resource-based policies that prevent IAM principals outside your organization from accessing resources that are inside your organization boundary (by using the aws:PrincipalOrgID condition). The combination of both policies prevents any access to and from AWS accounts that are not part of your organization. In the next sections, we’ll walk through an example of how to configure the identity-based policy in your organization. For the resource-based policy, you can follow along with the example in An easier way to control access to AWS resources by using the AWS organization of IAM principals on the AWS Security blog.

Setup for the examples

In the following sections, we’ll show an example IAM policy for each of the new conditions. To follow along with Example 1, which uses aws:ResourceAccount, you’ll just need an AWS account.

To follow along with Examples 2 and 3 that use aws:ResourceOrgPaths and aws:ResourceOrgID respectively, you’ll need to have an organization in AWS Organizations and at least one OU created. This blog post assumes that you have some familiarity with the basic concepts in IAM and AWS Organizations. If you need help creating an organization or want to learn more about AWS Organizations, visit Getting Started with AWS Organizations in the AWS documentation.

Which IAM policy type should I use?

You can implement the following examples as identity-based policies, or in SCPs that are managed in AWS Organizations. If you want to establish a boundary for some of your IAM principals, we recommend that you use identity-based policies. If you want to establish a boundary for an entire AWS account or for your organization, we recommend that you use SCPs. Because SCPs apply to an entire AWS account, you should take care when you apply the following policies to your organization, and account for any exceptions to these rules that might be necessary for some AWS services to function properly.

Example 1: Restrict access to AWS resources within a specific AWS account

Let’s look at an example IAM policy that restricts access along the boundary of a single AWS account. For this example, say that you have an IAM principal in account 222222222222, and you want to prevent the principal from accessing S3 objects outside of this account. To create this effect, you could attach the following IAM policy.

  "Version": "2012-10-17",
  "Statement": [
      "Sid": " DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:ResourceAccount": [

Note: This policy is not meant to replace your existing IAM access controls, because it does not grant any access. Instead, this policy can act as an additional guardrail for your other IAM permissions. You can use a policy like this to prevent your principals from access to any AWS accounts that you don’t know or control, regardless of the permissions granted through other IAM policies.

This policy uses a Deny effect to block access to S3 actions unless the S3 resource being accessed is in account 222222222222. This policy prevents S3 access to accounts outside of the boundary of a single AWS account. You can use a policy like this one to limit your IAM principals to access only the resources that are inside your trusted AWS accounts. To implement a policy like this example yourself, replace account ID 222222222222 in the policy with your own AWS account ID. For a policy you can apply to multiple accounts while still maintaining this restriction, you could alternatively replace the account ID with the aws:PrincipalAccount condition key, to require that the principal and resource must be in the same account (see example #3 in this post for more details how to accomplish this).

Organization setup: Welcome to AnyCompany

For the next two examples, we’ll use an example organization called AnyCompany that we created in AWS Organizations. You can create a similar organization to follow along directly with these examples, or adapt the sample policies to fit your own organization. Figure 1 shows the organization structure for AnyCompany.

Figure 1: Organization structure for AnyCompany

Figure 1: Organization structure for AnyCompany

Like all organizations, AnyCompany has an organization root. Under the root are three OUs: Media, Sports, and Governance. Under the Sports OU, there are three more OUs: Baseball, Basketball, and Football. AWS accounts in this organization are spread across all the OUs based on their business purpose. In total, there are six OUs in this organization.

Example 2: Restrict access to AWS resources within my organizational unit

Now that you’ve seen what the AnyCompany organization looks like, let’s walk through another example IAM policy that you can use to restrict access to a specific part of your organization. For this example, let’s say you want to restrict S3 object access within the following OUs in the AnyCompany organization:

  • Media
  • Sports
  • Baseball
  • Basketball
  • Football

To define a boundary around these OUs, you don’t need to list all of them in your IAM policy. Instead, you can use the organization structure to your advantage. The Baseball, Basketball, and Football OUs share a parent, the Sports OU. You can use the new aws:ResourceOrgPaths key to prevent access outside of the Media OU, the Sports OU, and any OUs under it. Here’s the IAM policy that achieves this effect.

  "Version": "2012-10-17",
  "Statement": [
      "Sid": " DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringNotLike": {
          "aws:ResourceOrgPaths": [

Note: Like the earlier example, this policy does not grant any access. Instead, this policy provides a backstop for your other IAM permissions, preventing your principals from accessing S3 objects outside an OU-defined boundary. If you want to require that your IAM principals consistently follow this rule, we recommend that you apply this policy as an SCP. In this example, we attached this policy to the root of our organization, applying it to all principals across all accounts in the AnyCompany organization.

The policy denies access to S3 actions unless the S3 resource being accessed is in a specific set of OUs in the AnyCompany organization. This policy is identical to Example 1, except for the condition block: The condition requires that aws:ResourceOrgPaths contains any of the listed OU paths. Because aws:ResourceOrgPaths is a multi-value condition, the policy uses the ForAllValues:StringNotLike operator to compare the values of aws:ResourceOrgPaths to the list of OUs in the policy.

The first OU path in the list is for the Media OU. The second OU path is the Sports OU, but it also adds the wildcard character * to the end of the path. The wildcard * matches any combination of characters, and so this condition matches both the Sports OU and any other OU further down its path. Using wildcards in the OU path allows you to implicitly reference other OUs inside the Sports OU, without having to list them explicitly in the policy. For more information about wildcards, refer to Using wildcards in resource ARNs in the IAM documentation.

Example 3: Restrict access to AWS resources within my organization

Finally, we’ll look at a very simple example of a boundary that is defined at the level of an entire organization. This is the same use case as the preceding two examples (restrict access to S3 object access), but scoped to an organization instead of an account or collection of OUs.

  "Version": "2012-10-17",
  "Statement": [
      "Sid": "DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
      "Resource": "arn:aws:s3:::*/*",
      "Condition": {
        "StringNotEquals": {
          "aws:ResourceOrgID": "${aws:PrincipalOrgID}"

Note: Like the earlier examples, this policy does not grant any access. Instead, this policy provides a backstop for your other IAM permissions, preventing your principals from accessing S3 objects outside your organization regardless of their other access permissions. If you want to require that your IAM principals consistently follow this rule, we recommend that you apply this policy as an SCP. As in the previous example, we attached this policy to the root of our organization, applying it to all accounts in the AnyCompany organization.

The policy denies access to S3 actions unless the S3 resource being accessed is in the same organization as the IAM principal that is accessing it. This policy is identical to Example 1, except for the condition block: The condition requires that aws:ResourceOrgID and aws:PrincipalOrgID must be equal to each other. With this requirement, the principal making the request and the resource being accessed must be in the same organization. This policy also applies to S3 resources that are created after the policy is put into effect, so it is simple to maintain the same security posture across all your resources.

For more information about aws:PrincipalOrgID, refer to AWS global condition context keys in the IAM documentation.

Learn more

In this post, we explored the new conditions, and walked through a few examples to show you how to restrict access to S3 objects across the boundary of an account, OU, or organization. These tools work for more than just S3, though: You can use the new conditions to help you protect a wide variety of AWS services and actions. Here are a few links that you may want to look at:

If you have any questions, comments, or concerns, contact AWS Support or start a new thread on the AWS Identity and Access Management forum. Thanks for reading about this new feature. If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rishi Mehrotra

Rishi Mehrotra

Rishi is a Product Manager in AWS IAM. He enjoys working with customers and influencing products decisions. Prior to Amazon, Rishi worked for enterprise IT customers after receiving engineering degree in computer science. He recently pursued MBA from The University of Chicago Booth School of Business. Outside of work, Rishi enjoys biking, reading, and playing with his kids.


Michael Switzer

Mike is the product manager for the Identity and Access Management service at AWS. He enjoys working directly with customers to identify solutions to their challenges, and using data-driven decision making to drive his work. Outside of work, Mike is an avid cyclist and outdoorsperson. He holds a master’s degree in computational mathematics from the University of Washington.

Optimize AI/ML workloads for sustainability: Part 3, deployment and monitoring

We’re celebrating Earth Day 2022 from 4/22 through 4/29 with posts that highlight how to build, maintain, and refine your workloads for sustainability.

AWS estimates that inference (the process of using a trained machine learning [ML] algorithm to make a prediction) makes up 90 percent of the cost of an ML model. Given with AWS you pay for what you use, we estimate that inference also generally equates to most of the resource usage within an ML lifecycle.

In this series, we’re following the phases of the Well-Architected machine learning lifecycle (Figure 1) to optimize your artificial intelligence (AI)/ML workloads. In Part 3, our final piece in the series, we show you how to reduce the environmental impact of your ML workload once your model is in production.

If you missed the first parts of this series, in Part 1, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing. In Part 2, we identified ways to reduce the environmental impact of developing, training, and tuning ML models.

ML lifecycle

Figure 1. ML lifecycle


Select sustainable AWS Regions

As mentioned in Part 1, select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to deploy your model.

Align SLAs with sustainability goals

Define SLAs that support your sustainability goals while meeting your business requirements:

Use efficient silicon

For CPU-based ML inference, use AWS Graviton3. These processors offer the best performance per watt in Amazon Elastic Compute Cloud (Amazon EC2). They use up to 60% less energy than comparable EC2 instances. Graviton3 processors deliver up to three times better performance compared to Graviton2 processors for ML workloads, and they support bfloat16.

For deep learning workloads, the Amazon EC2 Inf1 instances (based on custom designed AWS Inferentia chips) deliver 2.3 times higher throughput and 80% lower cost compared to g4dn instances. Inf1 has 50% higher performance per watt than g4dn, which makes it the most sustainable ML accelerator Amazon EC2 offers.

Make efficient use of GPU

Use Amazon Elastic Inference to attach just the right amount of GPU-powered inference acceleration to any EC2 or SageMaker instance type or Amazon Elastic Container Service (Amazon ECS) task.

While training jobs batch process hundreds of data samples in parallel, inference jobs usually process a single input in real time, and thus consume a small amount of GPU compute. Elastic Inference allows you to reduce the cost and environmental impact of your inference by using GPU resources more efficiently.

Optimize models for inference

Improve efficiency of your models by compiling them into optimized forms with the following:

  • Various open-source libraries (like Treelite for decision tree ensembles)
  • Third-party tools like Hugging Face Infinity, which allows you to speed up transformer models and run inference not only on GPU but also on CPU.
  • SageMaker Neo’s runtime consumes as little as one-tenth the footprint of a deep learning framework and optimizes models to perform up to 25 time faster with no loss in accuracy (example with XGBoost).

Deploying more efficient models means you need fewer resources for inference.

Deploy multiple models behind a single endpoint

SageMaker provides three methods to deploy multiple models to a single endpoint to improve endpoint utilization:

  1. Host multiple models in one container behind one endpoint. Multi-model endpoints are served using a single container. This can help you cut up to 90 percent of your inference costs and carbon emissions.
  2. Host multiple models that use different containers behind one endpoint.
  3. Host a linear sequence of containers in an inference pipeline behind a single endpoint.

Sharing endpoint resources is more sustainable and less expensive than deploying a single model behind one endpoint.

Right-size your inference environment

Right-size your endpoints by using metrics from Amazon CloudWatch or by using the Amazon SageMaker Inference Recommender. This tool can run load testing jobs and recommend the proper instance type to host your model. When you use the appropriate instance type, you limit the carbon emission associated with over-provisioning.

If your workload has intermittent or unpredictable traffic, configure autoscaling inference endpoints in SageMaker to optimize your endpoints. Autoscaling monitors your endpoints and dynamically adjusts their capacity to maintain steady and predictable performance using as few resources as possible. You can also try Serverless Inference (in preview), which automatically launches compute resources and scales them in and out depending on traffic, which eliminates idle resources.

Consider inference at the edge

When working on Internet of Things (IoT) use cases, evaluate if ML inference at the edge can reduce the carbon footprint of your workload. To do this, consider factors like the compute capacity of your devices, their energy consumption, or the emissions related to data transfer to the cloud. When deploying ML models to edge devices, consider using SageMaker Edge Manager, which integrates with SageMaker Neo and AWS IoT Greengrass (Figure 2).

Run inference at the edge with SageMaker Edge

Figure 2. Run inference at the edge with SageMaker Edge

Device manufacturing represents 32-57 percent of the global Information Communication Technology carbon footprint. If your ML model is optimized, it requires less compute resources. You can then perform inference on lower specification machines, which minimizes the environmental impact of the device manufacturing and uses less energy.

The following techniques compress the size of models for deployment, which speeds up inference and saves energy without significant loss of accuracy:

  • Pruning removes weights (learnable parameters) that don’t contribute much to the model.
  • Quantization represents numbers with the low-bit integers without incurring significant loss in accuracy. Specifically, you can reduce resource usage by replacing the parameters in an inference model with half-precision (16 bit), bfloat16 (16 bit, but the same dynamic range as 32 bit), or 8-bit integers instead of the usual single-precision floating-point (32 bit) values.

Archive or delete unnecessary artifacts

Compress and reduce the volume of logs you keep during the inference phase. By default, CloudWatch retains logs indefinitely. By setting limited retention time for your inference logs, you’ll avoid the carbon footprint of unnecessary log storage. Also delete unused versions of your models and custom container images from your repositories.


Retrain only when necessary

Monitor your ML model in production and only retrain if it’s required. Because of model drift, robustness, or new ground truth data being available, models usually need to be retrained. Instead of retraining arbitrarily, monitor your ML model in production, automate your model drift detection and only retrain when your model’s predictive performance has fallen below defined KPIs.

Consider SageMaker PipelinesAWS Step Functions Data Science SDK for Amazon SageMaker, or third-party tools to automate your retraining pipelines.

Measure results and improve

To monitor and quantify improvements during the inference phase, track the following metrics:

For storage:


AI/ML workloads can be energy intensive, but as called out by UN and mentioned in the last IPCC report, AI can contribute to mitigation of climate change and the achievement of several Sustainable Development Goals. As technology builders, it’s our responsibility to make sustainable use of AI and ML.

In this blog post series, we presented best practices you can use to make sustainability-conscious architectural decisions and reduce the environmental impact for your AI/ML workloads.

Other posts in this series

About the Well-Architected Framework

These practices are part of the Sustainability Pillar of the AWS Well-Architected Framework. AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

The collective thoughts of the interwebz

