Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

Post Syndicated from Praveen Kumar original https://aws.amazon.com/blogs/big-data/implement-tag-based-access-control-for-your-data-lake-and-amazon-redshift-data-sharing-with-aws-lake-formation/

Data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. Many organizations have a distributed tools and infrastructure across various business units. This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts.

Amazon Redshift data sharing allows you to securely share live, transactionally consistent data in one Amazon Redshift data warehouse with another Redshift data warehouse within the same AWS account, across accounts, and across Regions, without needing to copy or move data from one cluster to another. Customers want to be able to manage their permissions in a central place across all of their assets. Previously, the management of Redshift datashares was limited to only within Amazon Redshift, which made it difficult to manage your data lake permissions and Amazon Redshift permissions in a single place. For example, you had to navigate to an individual account to view and manage access information for Amazon Redshift and the data lake on Amazon Simple Storage Service (Amazon S3). As an organization grows, administrators want a mechanism to effectively and centrally manage data sharing across data lakes and data warehouses for governance and auditing, and to enforce fine-grained access control.

We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation. With this feature, Amazon Redshift customers can now manage sharing, apply access policies centrally, and effectively scale the permission using LF-Tags.

Lake Formation has been a popular choice for centrally governing data lakes backed by Amazon S3. Now, with Lake Formation support for Amazon Redshift data sharing, it opens up new design patterns and broadens governance and security posture across data warehouses. With this integration, you can use Lake Formation to define fine-grained access control on tables and views being shared with Amazon Redshift data sharing for federated AWS Identity and Access Management (IAM) users and IAM roles. Lake Formation also provides tag-based access control (TBAC), which can be used to simplify and scale governance of data catalog objects such as databases and tables.

In this post, we discuss this new feature and how to implement TBAC for your data lake and Amazon Redshift data sharing on Lake Formation.

Solution overview

Lake Formation tag-based access control (LF-TBAC) allows you to group similar AWS Glue Data Catalog resources together and define the grant or revoke permissions policy by using an LF-Tag expression. LF-Tags are hierarchical in that when a database is tagged with an LF-Tag, all tables in that database inherit the tag, and when a LF-Tag is applied to a table, all the columns within that table inherit the tag. Inherited tags then can be overridden if needed. You then can create access policies within Lake Formation using LF-Tag expressions to grant principals access to tagged resources using an LF-Tag expression. See Managing LF-Tags for metadata access control for more details.

To demonstrate LF-TBAC with central data access governance capability, we use the scenario where two separate business units own particular datasets and need to share data across teams.

We have a customer care team who manages and owns the customer information database including customer demographics data. And have a marketing team who owns a customer leads dataset, which includes information on prospective customers and contact leads.

To be able to run effective campaigns, the marketing team needs access to the customer data. In this post, we demonstrate the process of sharing this data that is stored in the data warehouse and giving the marketing team access. Furthermore, there are personally identifiable information (PII) columns within the customer dataset that should only be accessed by a subset of power users on a need-to-know basis. This way, data analysts within marketing can only see non-PII columns to be able to run anonymous customer segment analysis, but a group of power users can access PII columns (for example, customer email address) to be able to run campaigns or surveys for specific groups of customers.

The following diagram shows the structure of the datasets that we work with in this post and a tagging strategy to provide fine-grained column-level access.

Beyond our tagging strategy on the data resources, the following table gives an overview of how we should grant permissions to our two personas via tags.

IAM Role Persona Resource Type Permission LF-Tag expression
marketing-analyst A data analyst in the marketing team DB describe (department:marketing OR department:customer) AND classification:private
. Table select (department:marketing OR department:customer) AND classification:private
. . . . .
marketing-poweruser A privileged user in the marketing team DB describe (department:marketing OR department:customer) AND classification: private
. Table (Column) select (department:marketing OR department:customer) AND (classification:private OR classification:pii-sensitive)

The following diagram gives a high-level overview of the setup that we deploy in this post.

The following is a high-level overview of how to use Lake Formation to control datashare permissions:

Producer Setup:

  1. In the producers AWS account, the Amazon Redshift administrator that owns the customer database creates a Redshift datashare on the producer cluster and grants usage to the AWS Glue Data Catalog in the same account.
  2. The producer cluster administrator authorizes the Lake Formation account to access the datashare.
  3. In Lake Formation, the Lake Formation administrator discovers and registers the datashares. They must discover the AWS Glue ARNs they have access to and associate the datashares with an AWS Glue Data Catalog ARN. If you’re using the AWS Command Line Interface (AWS CLI), you can discover and accept datashares with the Redshift CLI operations describe-data-shares and associate-data-share-consumer. To register a datashare, use the Lake Formation CLI operation register-resource.
  4. The Lake Formation administrator creates a federated database in the AWS Glue Data Catalog; assigns tags to the databases, tables, and columns; and configures Lake Formation permissions to control user access to objects within the datashare. For more information about federated databases in AWS Glue, see Managing permissions for data in an Amazon Redshift datashare.

Consumer Setup:

  1. On the consumer side (marketing), the Amazon Redshift administrator discovers the AWS Glue database ARNs they have access to, creates an external database in the Redshift consumer cluster using an AWS Glue database ARN, and grants usage to database users authenticated with IAM credentials to start querying the Redshift database.
  2. Database users can use the views SVV_EXTERNAL_TABLES and SVV_EXTERNAL_COLUMNS to find all the tables or columns within the AWS Glue database that they have access to; then they can query the AWS Glue database’s tables.

When the producer cluster administrator decides to no longer share the data with the consumer cluster, the producer cluster administrator can revoke usage, deauthorize, or delete the datashare from Amazon Redshift. The associated permissions and objects in Lake Formation are not automatically deleted.

Prerequisites:

To follow the steps in this post, you must satisfy the following prerequisites:

Deploy environment including producer and consumer Redshift clusters

To follow along the steps outlined in this post, deploy following AWS CloudFormation stack that includes necessary resources to demonstrate the subject of this post:

  1. Choose Launch stack to deploy a CloudFormation template.
  2. Provide an IAM role that you have already configured as a Lake Formation administrator.
  3. Complete the steps to deploy the template and leave all settings as default.
  4. Select I acknowledge that AWS CloudFormation might create IAM resources, then choose Submit.

This CloudFormation stack creates the following resources:

  • Producer Redshift cluster – Owned by the customer care team and has customer and demographic data on it.
  • Consumer Redshift cluster – Owned by the marketing team and is used to analyze data across data warehouses and data lakes.
  • S3 data lake – Contains the web activity and leads datasets.
  • Other necessary resources to demonstrate the process of sharing data – For example, IAM roles, Lake Formation configuration, and more. For a full list of resources created by the stack, examine the CloudFormation template.

After you deploy this CloudFormation template, resources created will incur cost to your AWS account. At the end of the process, make sure that you clean up resources to avoid unnecessary charges.

After the CloudFormation stack is deployed successfully (status shows as CREATE_COMPLETE), take note of the following items on the Outputs tab:

  • Marketing analyst role ARN
  • Marketing power user role ARN
  • URL for Amazon Redshift admin password stored in AWS Secrets Manager

Create a Redshift datashare and add relevant tables

On the AWS Management Console, switch to the role that you nominated as Lake Formation admin when deploying the CloudFormation template. Then go to Query Editor v2. If this is the first time using Query Editor V2 in your account, follow these steps to configure your AWS account.

The first step in Query Editor is to log in to the customer Redshift cluster using the database admin credentials to make your IAM admin role a DB admin on the database.

  1. Choose the options menu (three dots) next to the lfunified-customer-dwh cluster and choose Create connection.

  2. Select Database user name and password.
  3. Leave Database as dev.
  4. For User name, enter admin.
  5. For Password, complete the following steps:
    1. Go to the console URL, which is the value of the RedShiftClusterPassword CloudFormation output in previous step. The URL is the Secrets Manager console for this password.
    2. Scroll down to the Secret value section and choose Retrieve secret value.
    3. Take note of the password to use later when connecting to the marketing Redshift cluster.
    4. Enter this value for Password.
  6. Choose Create connection.

Create a datashare using a SQL command

Complete the following steps to create a datashare in the data producer cluster (customer care) and share it with Lake Formation:

  1. On the Amazon Redshift console, in the navigation pane, choose Editor, then Query editor V2.
  2. Choose (right-click) the cluster name and choose Edit connection or Create connection.
  3. For Authentication, select Temporary credentials using your IAM identity.

Refer to Connecting to an Amazon Redshift database to learn more about the various authentication methods.

  1. For Database, enter a database name (for this post, dev).
  2. Choose Create connection to connect to the database.
  3. Run the following SQL commands to create the datashare and add the data objects to be shared:
    create datashare customer_ds;
    ALTER DATASHARE customer_ds ADD SCHEMA PUBLIC;
    ALTER DATASHARE customer_ds ADD TABLE customer;

  4. Run the following SQL command to share the customer datashare to the current account via the AWS Glue Data Catalog:
    GRANT USAGE ON DATASHARE customer_ds TO ACCOUNT '<aws-account-id>' via DATA CATALOG;

  5. Verify the datashare was created and objects shared by running the following SQL command:
    DESC DATASHARE customer_ds;

Take note of the datashare producer cluster name space and account ID, which will be used in the following step. You can complete the following actions on the console, but for simplicity, we use AWS CLI commands.

  1. Go to CloudShell or your AWS CLI and run the following AWS CLI command to authorize the datashare to the Data Catalog so that Lake Formation can manage them:
    aws redshift authorize-data-share \
    --data-share-arn 'arn:aws:redshift:<aws-region>:<aws-account-id>:datashare:<producer-cluster-namespace>/customer_ds' \
    --consumer-identifier DataCatalog/<aws-account-id>

The following is an example output:

 {
    "DataShareArn": "arn:aws:redshift:us-east-2:<aws-account-id>:datashare:cd8d91b5-0c17-4567-a52a-59f1bdda71cd/customer_ds",
    "ProducerArn": "arn:aws:redshift:us-east-2:<aws-account-id>:namespace:cd8d91b5-0c17-4567-a52a-59f1bdda71cd",
    "AllowPubliclyAccessibleConsumers": false,
    "DataShareAssociations": [{
        "ConsumerIdentifier": "DataCatalog/<aws-account-id>XX",
        "Status": "AUTHORIZED",
        "CreatedDate": "2022-11-09T21:10:30.507000+00:00",
        "StatusChangeDate": "2022-11-09T21:10:50.932000+00:00"
    }]
}

Take note of your datashare ARN that you used in this command to use in the next steps.

Accept the datashare in the Lake Formation catalog

To accept the datashare, complete the following steps:

  1. Run the following AWS CLI command to accept and associate the Amazon Redshift datashare to the AWS Glue Data Catalog:
    aws redshift associate-data-share-consumer --data-share-arn 'arn:aws:redshift:<aws-region>:<aws-account-id>:datashare:<producer-cluster-namespace>/customer_ds' \
    --consumer-arn arn:aws:glue:<aws-region>:<aws-account-id>:catalog

The following is an example output:

{
 "DataShareArn": "arn:aws:redshift:us-east-2:<aws-account-id>:datashare:cfd5fcbd-3492-42b5-9507-dad5d87f7427/customer_ds",
 "ProducerArn": "arn:aws:redshift:us-east-2:<aws-account-id>:namespace:cfd5fcbd-3492-42b5-9507-dad5d87f7427",
 "AllowPubliclyAccessibleConsumers": false,
 "DataShareAssociations": [
 {
 "ConsumerIdentifier": "arn:aws:glue:us-east-2:<aws-account-id>:catalog",
 "Status": "ACTIVE",
 "ConsumerRegion": "us-east-2",
 "CreatedDate": "2023-05-18T12:25:11.178000+00:00",
 "StatusChangeDate": "2023-05-18T12:25:11.178000+00:00"
 }
 ]
}
  1. Register the datashare in Lake Formation:
    aws lakeformation register-resource \
     --resource-arn arn:aws:redshift:<aws-region>:<producer-aws-account-id>:datashare:<producer-cluster-namespace>/customer_ds

  2. Create the AWS Glue database that points to the accepted Redshift datashare:
    aws glue create-database --region <aws-region> --cli-input-json '{
        "CatalogId": "<aws-account-id>",
        "DatabaseInput": {
            "Name": "customer_db_shared",
            "FederatedDatabase": {
                "Identifier": "arn:aws:redshift:<aws-region>:<producer-aws-account-id>:datashare:<producer-cluster-namespace>/customer_ds",
                "ConnectionName": "aws:redshift"
            }
        }
    }'

  3. To verify, go to the Lake Formation console and check that the database customer_db_shared is created.

Now the data lake administrator can view and grant access on both the database and tables to the data consumer team (marketing) personas using Lake Formation TBAC.

Assign Lake Formation tags to resources

Before we grant appropriate access to the IAM principals of the data analyst and power user within the marketing team, we have to assign LF-tags to tables and columns of the customer_db_shared database. We then grant these principals permission to appropriate LF-tags.

To assign LF-tags, follow these steps:

  1. Assign the department and classification LF-tag to customer_db_shared (Redshift datashare) based on the tagging strategy table in the solution overview. You can run the following actions on the console, but for this post, we use the following AWS CLI command:
    aws lakeformation add-lf-tags-to-resource --cli-input-json '{
        "CatalogId": "<aws-account-id>",
        "Resource": {
        "Database": {
        "CatalogId": "<aws-account-id>",
        "Name": "customer_db_shared"
        }
        },
        "LFTags": [
        {
        "CatalogId": "<aws-account-id>",
        "TagKey": "department",
        "TagValues": [
        "customer"]
        },
        {
        "CatalogId": "<aws-account-id>",
        "TagKey": "classification",
        "TagValues": [
        "private"]
        }
        ]
        }'

If the command is successful, you should get a response like the following:

{
"Failures": []
}
  1. Assign the appropriate department and classification LF-tag to marketing_db (on the S3 data lake):
    aws lakeformation add-lf-tags-to-resource --cli-input-json '{
        "CatalogId": "<aws-account-id>",
        "Resource": {
        "Database": {
        "CatalogId": "<aws-account-id>",
        "Name": "lfunified_marketing_dl_db"
        }
        },
        "LFTags": [
        {
        "CatalogId": "<aws-account-id>",
        "TagKey": "department",
        "TagValues": [
        "marketing"]
        },
        {
        "CatalogId": "<aws-account-id>",
        "TagKey": "classification",
        "TagValues": [
        "private"]
        }
        ]
        }'

Note that although you only assign the department and classification tag on the database level, it gets inherited by the tables and columns within that database.

  1. Assign the classification pii-sensitive LF-tag to PII columns of the customer table to override the inherited value from the database level:
    aws lakeformation add-lf-tags-to-resource --cli-input-json '{
        "CatalogId": "<aws-account-id>",
        "Resource": {
        "TableWithColumns": {
        "CatalogId": "<aws-account-id>",
        "DatabaseName": "customer_db_shared",
        "Name": "public.customer",
        "ColumnNames":["c_first_name","c_last_name","c_email_address"]
        }
        },
        "LFTags": [
        {
        "CatalogId": "<aws-account-id>",
        "TagKey": "classification",
        "TagValues": [
        "pii-sensitive"]
        }
        ]
        }'

Grant permission based on LF-tag association

Run the following two AWS CLI commands to allow the marketing data analyst access to the customer table excluding the pii-sensitive (PII) columns. Replace the value for DataLakePrincipalIdentifier with the MarketingAnalystRoleARN that you noted from the outputs of the CloudFormation stack:

aws lakeformation grant-permissions --cli-input-json '{
    "CatalogId": "<aws-account-id>",
    "Principal": {"DataLakePrincipalIdentifier" : "<MarketingAnalystRoleARN-from-CloudFormation-Outputs>"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "DATABASE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private"]}]
    }
    },
    "Permissions": [
    "DESCRIBE"
    ],
    "PermissionsWithGrantOption": []
}'
aws lakeformation grant-permissions --cli-input-json '{
    "CatalogId": "<aws-account-id>",
    "Principal": {"DataLakePrincipalIdentifier" : "<MarketingAnalystRoleARN-from-CloudFormation-Outputs>"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "TABLE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private"]}]
    }
    },
    "Permissions": [
    "SELECT"
    ],
    "PermissionsWithGrantOption": []
}'

We have now granted marketing analysts access to the customer database and tables that are not pii-sensitive.

To allow marketing power users access to table columns with restricted LF-tag (PII columns), run the following AWS CLI command:

aws lakeformation grant-permissions --cli-input-json '{
    "CatalogId": "<aws-account-id>",
    "Principal": {"DataLakePrincipalIdentifier" : "<MarketingPowerUserRoleARN-from-CloudFormation-Outputs>"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "DATABASE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private"]}]
    }
    },
    "Permissions": [
    "DESCRIBE"
    ],
    "PermissionsWithGrantOption": []
}'
aws lakeformation grant-permissions --cli-input-json '{
    "CatalogId": "<aws-account-id>",
    "Principal": {"DataLakePrincipalIdentifier" : "<MarketingPowerUserRoleARN-from-CloudFormation-Outputs>"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "TABLE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private", "pii-sensitive"]}]
    }
    },
    "Permissions": [
    "SELECT"
    ],
    "PermissionsWithGrantOption": []
}'

We can combine the grants into a single batch grant permissions call:

aws lakeformation batch-grant-permissions --region us-east-1 --cli-input-json '{
    "CatalogId": "<aws-account-id>",
 "Entries": [
 {  "Id": "1",
    "Principal": {"DataLakePrincipalIdentifier" : "arn:aws:iam:: <aws-account-id>:role/Blog-MarketingAnalystRole-1CYV6JSNN14E3"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "DATABASE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private"]}]
    }
    },
    "Permissions": [
    "DESCRIBE"
    ],
    "PermissionsWithGrantOption": []
    },
    {  "Id": "2",
    "Principal": {"DataLakePrincipalIdentifier" : "arn:aws:iam:: <aws-account-id>:role/Blog-MarketingAnalystRole-1CYV6JSNN14E3"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "TABLE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private"]}]
    }
    },
    "Permissions": [
    "SELECT"
    ],
    "PermissionsWithGrantOption": []
    },
     {  "Id": "3",
    "Principal": {"DataLakePrincipalIdentifier" : "arn:aws:iam:: <aws-account-id>:role/Blog-MarketingPoweruserRole-RKKM0TWQBP0W"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "DATABASE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private", "pii-sensitive"]}]
    }
    },
    "Permissions": [
    "DESCRIBE"
    ],
    "PermissionsWithGrantOption": []
    },
    {  "Id": "4",
    "Principal": {"DataLakePrincipalIdentifier" : "arn:aws:iam:: <aws-account-id>:role/Blog-MarketingPoweruserRole-RKKM0TWQBP0W"},
    "Resource": {
    "LFTagPolicy": {
    "CatalogId": "<aws-account-id>",
    "ResourceType": "TABLE",
    "Expression": [{"TagKey": "department","TagValues": ["marketing","customer"]},{"TagKey": "classification","TagValues": ["private", "pii-sensitive"]}]
    }
    },
    "Permissions": [
    "SELECT"
    ],
    "PermissionsWithGrantOption": []
    }
    ]
 }'

Validate the solution

In this section, we go through the steps to test the scenario.

Consume the datashare in the consumer (marketing) data warehouse

To enable the consumers (marketing team) to access the customer data shared with them via the datashare, first we have to configure Query Editor v2. This configuration is to use IAM credentials as the principal for the Lake Formation permissions. Complete the following steps:

  1. Sign in to the console using the admin role you nominated in running the CloudFormation template step.
  2. On the Amazon Redshift console, go to Query Editor v2.
  3. Choose the gear icon in the navigation pane, then choose Account settings.
  4. Under Connection settings, select Authenticate with IAM credentials.
  5. Choose Save.

Now let’s connect to the marketing Redshift cluster and make the customer database available to the marketing team.

  1. Choose the options menu (three dots) next to the Serverless:lfunified-marketing-wg cluster and choose Create connection.
  2. Select Database user name and password.
  3. Leave Database as dev.
  4. For User name, enter admin.
  5. For Password, enter the same password you retrieved from Secrets Manger in an earlier step.
  6. Choose Create connection.
  7. Once successfully connected, choose the plus sign and choose Editor to open a new Query Editor tab.
  8. Make sure that you specify the Serverless: lfunified-marketing-wg workgroup and dev database.
  9. To create the Redshift database from the shared catalog database, run the following SQL command on the new tab:
    CREATE DATABASE ext_customerdb_shared FROM ARN 'arn:aws:glue:<aws-region>:<aws-account-id>:database/customer_db_shared' WITH DATA CATALOG SCHEMA "customer_db_shared"

  10. Run the following SQL commands to create and grant usage on the Redshift database to the IAM roles for the power users and data analyst. You can get the IAM role names from the CloudFormation stack outputs:
    CREATE USER IAMR:"lf-redshift-ds-MarketingAnalystRole-XXXXXXXXXXXX" password disable;
    GRANT USAGE ON DATABASE ext_customerdb_shared to IAMR:"lf-redshift-ds-MarketingAnalystRole-XXXXXXXXXXXX";
    
    CREATE USER IAMR:"lf-redshift-ds-MarketingPoweruserRole-YYYYYYYYYYYY" password disable;
    GRANT USAGE ON DATABASE ext_customerdb_shared to IAMR:"lf-redshift-ds-MarketingPoweruserRole-YYYYYYYYYYYY";

Create the data lake schema in AWS Glue and allow the marketing power role to query the lead and web activity data

Run the following SQL commands to make the lead data in the S3 data lake available to the marketing team:

create external schema datalake from data catalog
database 'lfunified_marketing_dl_db' 
iam_role 'SESSION'
catalog_id '<aws-account-id>';
GRANT USAGE ON SCHEMA datalake TO IAMR:"lf-redshift-ds-MarketingAnalystRole-XXXXXXXXXXXX";
GRANT USAGE ON SCHEMA datalake TO IAMR:"lf-redshift-ds-MarketingPoweruserRole-YYYYYYYYYYYY";

Query the shared dataset as a marketing analyst user

To validate that the marketing team analysts (IAM role marketing-analyst-role) have access to the shared database, perform the following steps:

  1. Sign in to the console (for convenience, you can use a different browser) and switch your role to lf-redshift-ds-MarketingAnalystRole-XXXXXXXXXXXX.
  2. On the Amazon Redshift console, go to Query Editor v2.
  3. To connect to the consumer cluster, choose the Serverless: lfunified-marketing-wg consumer data warehouse in the navigation pane.
  4. When prompted, for Authentication, select Federated user.
  5. For Database, enter the database name (for this post, dev).
  6. Choose Save.
  7. Once you’re connected to the database, you can validate the current logged-in user with the following SQL command:
    select current_user;

  8. To find the federated databases created on the consumer account, run the following SQL command:
    SHOW DATABASES FROM DATA CATALOG ACCOUNT '<aws-account-id>';

  9. To validate permissions for the marketing analyst role, run the following SQL command:
    select * from ext_customerdb_shared.public.customer limit 10;

As you can see in the following screenshot, the marketing analyst is able to successfully access the customer data but only the non-PII attributes, which was our intention.

  1. Now let’s validate that the marketing analyst doesn’t have access to the PII columns of the same table:
    select c_customer_email from ext_customerdb_shared.public.customer limit 10;

Query the shared datasets as a marketing power user

To validate that the marketing power users (IAM role lf-redshift-ds-MarketingPoweruserRole-YYYYYYYYYYYY) have access to pii-sensetive columns in the shared database, perform the following steps:

  1. Sign in to the console (for convenience, you can use a different browser) and switch your role to lf-redshift-ds-MarketingPoweruserRole-YYYYYYYYYYYY.
  2. On the Amazon Redshift console, go to Query Editor v2.
  3. To connect to the consumer cluster, choose the Serverless: lfunified-marketing-wg consumer data warehouse in the navigation pane.
  4. When prompted, for Authentication, select Federated user.
  5. For Database, enter the database name (for this post, dev).
  6. Choose Save.
  7. Once you’re connected to the database, you can validate the current logged-in user with the following SQL command:
    select current_user;

  8. Now let’s validate that the marketing power role has access to the PII columns of the customer table:
    select c_customer_id, c_first_name, c_last_name,c_customer_email from customershareddb.public.customer limit 10;

  9. Validate that the power users within the marketing team can now run a query to combine data across different datasets that they have access to in order to run effective campaigns:
    SELECT
        emailaddress as emailAddress,  customer.c_first_name as firstName, customer.c_last_name as lastName, leadsource, contactnotes, usedpromo
    FROM
        "dev"."datalake"."lead" as lead
    JOIN ext_customerdb_shared.public.customer as customer
    ON lead.emailaddress = customer.c_email_address
    WHERE lead.donotreachout = 'false'

Clean up

After you complete the steps in this post, to clean up resources, delete the CloudFormation stack:

  1. On the AWS CloudFormation console, select the stack you deployed in the beginning of this post.
  2. Choose Delete and follow the prompts to delete the stack.

Conclusion

In this post, we showed how you can use Lake Formation tags and manage permissions for your data lake and Amazon Redshift data sharing using Lake Formation. Using Lake Formation LF-TBAC for data governance helps you manage your data lake and Amazon Redshift data sharing permissions at scale. Also, it enables data sharing across business units with fine-grained access control. Managing access to your data lake and Redshift datashares in a single place enables better governance, helping with data security and compliance.

If you have questions or suggestions, submit them in the comments section.

For more information on Lake Formation managed Amazon Redshift data sharing and tag-based access control, refer to Centrally manage access and permissions for Amazon Redshift data sharing with AWS Lake Formation and Easily manage your data lake at scale using AWS Lake Formation Tag-based access control.


About the Authors

Praveen Kumar is an Analytics Solution Architect at AWS with expertise in designing, building, and implementing modern data and analytics platforms using cloud-native services. His areas of interests are serverless technology, modern cloud data warehouses, streaming, and ML applications.

Srividya Parthasarathy is a Senior Big Data Architect on the AWS Lake Formation team. She enjoys building data mesh solutions and sharing them with the community.

Paul Villena is an Analytics Solutions Architect in AWS with expertise in building modern data and analytics solutions to drive business value. He works with customers to help them harness the power of the cloud. His areas of interests are infrastructure as code, serverless technologies, and coding in Python.

Mostafa Safipour is a Solutions Architect at AWS based out of Sydney. He works with customers to realize business outcomes using technology and AWS. Over the past decade, he has helped many large organizations in the ANZ region build their data, digital, and enterprise workloads on AWS.

Migrating your secrets to AWS Secrets Manager, Part 2: Implementation

Post Syndicated from Adesh Gairola original https://aws.amazon.com/blogs/security/migrating-your-secrets-to-aws-secrets-manager-part-2-implementation/

In Part 1 of this series, we provided guidance on how to discover and classify secrets and design a migration solution for customers who plan to migrate secrets to AWS Secrets Manager. We also mentioned steps that you can take to enable preventative and detective controls for Secrets Manager. In this post, we discuss how teams should approach the next phase, which is implementing the migration of secrets to Secrets Manager. We also provide a sample solution to demonstrate migration.

Implement secrets migration

Application teams lead the effort to design the migration strategy for their application secrets. Once you’ve made the decision to migrate your secrets to Secrets Manager, there are two potential options for migration implementation. One option is to move the application to AWS in its current state and then modify the application source code to retrieve secrets from Secrets Manager. Another option is to update the on-premises application to use Secrets Manager for retrieving secrets. You can use features such as AWS Identity and Access Management (IAM) Roles Anywhere to make the application communicate with Secrets Manager even before the migration, which can simplify the migration phase.

If the application code contains hardcoded secrets, the code should be updated so that it references Secrets Manager. A good interim state would be to pass these secrets as environment variables to your application. Using environment variables helps in decoupling the secrets retrieval logic from the application code and allows for a smooth cutover and rollback (if required).

Cutover to Secrets Manager should be done in a maintenance window. This minimizes downtime and impacts to production.

Before you perform the cutover procedure, verify the following:

  • Application components can access Secrets Manager APIs. Based on your environment, this connectivity might be provisioned through interface virtual private cloud (VPC) endpoints or over the internet.
  • Secrets exist in Secrets Manager and have the correct tags. This is important if you are using attribute-based access control (ABAC).
  • Applications that integrate with Secrets Manager have the required IAM permissions.
  • Have a well-documented cutover and rollback plan that contains the changes that will be made to the application during cutover. These would include steps like updating the code to use environment variables and updating the application to use IAM roles or instance profiles (for apps that are being migrated to Amazon Elastic Compute Cloud (Amazon EC2)).

After the cutover, verify that Secrets Manager integration was successful. You can use AWS CloudTrail to confirm that application components are using Secrets Manager.

We recommend that you further optimize your integration by enabling automatic secrets rotation. If your secrets were previously widely accessible (for example, they were stored in your Git repositories), we recommend rotating as soon as possible when migrating .

Sample application to demo integration with Secrets Manager

In the next sections, we present a sample AWS Cloud Development Kit (AWS CDK) solution that demonstrates the implementation of the previously discussed guardrails, design, and migration strategy. You can use the sample solution as a starting point and expand upon it. It includes components that environment teams may deploy to help provide potentially secure access for application teams to migrate their secrets to Secrets Manager. The solution uses ABAC, a tagging scheme, and IAM Roles Anywhere to demonstrate regulated access to secrets for application teams. Additionally, the solution contains client-side utilities to assist application and migration teams in updating secrets. Teams with on-premises applications that are seeking integration with Secrets Manager before migration can use the client-side utility for access through IAM Roles Anywhere.

The sample solution is hosted on the aws-secrets-manager-abac-authorization-samples GitHub repository and is made up of the following components:

  • A common environment infrastructure stack (created and owned by environment teams). This stack provisions the following resources:
    • A sample VPC created with Amazon Virtual Private Cloud (Amazon VPC), with PUBLIC, PRIVATE_WITH_NAT, and PRIVATE_ISOLATED subnet types.
    • VPC endpoints for the AWS Key Management Service (AWS KMS) and Secrets Manager services to the sample VPC. The use of VPC endpoints means that calls to AWS KMS and Secrets Manager are not made over the internet and remain internal to the AWS backbone network.
    • An empty shell secret, tagged with the supplied attributes and an IAM managed policy that uses attribute-based access control conditions. This means that the secret is managed in code, but the actual secret value is not visible in version control systems like GitHub or in AWS CloudFormation parameter inputs. 
  • An IAM Roles Anywhere infrastructure stack (created and owned by environment teams). This stack provisions the following resources:
    • An AWS Certificate Manager Private Certificate Authority (AWS Private CA).
    • An IAM Roles Anywhere public key infrastructure (PKI) trust anchor that uses AWS Private CA.
    • An IAM role for the on-premises application that uses the common environment infrastructure stack.
    • An IAM Roles Anywhere profile.

    Note: You can choose to use your existing CAs as trust anchors. If you do not have a CA, the stack described here provisions a PKI for you. IAM Roles Anywhere allows migration teams to use Secrets Manager before the application is moved to the cloud. Post migration, you could consider updating the applications to use native IAM integration (like instance profiles for EC2 instances) and revoking IAM Roles Anywhere credentials.

  • A client-side utility (primarily used by application or migration teams). This is a shell script that does the following:
    • Assists in provisioning a certificate by using OpenSSL.
    • Uses aws_signing_helper (Credential Helper) to set up AWS CLI profiles by using the credential_process for IAM Roles Anywhere.
    • Assists application teams to access and update their application secrets after assuming an IAM role by using IAM Roles Anywhere.
  • A sample application stack (created and owned by the application/migration team). This is a sample serverless application that demonstrates the use of the solution. It deploys the following components, which indicate that your ABAC-based IAM strategy is working as expected and is effectively restricting access to secrets:
    • The sample application stack uses a VPC-deployed common environment infrastructure stack.
    • It deploys an Amazon Aurora MySQL serverless cluster in the PRIVATE_ISOLATED subnet and uses the secret that is created through a common environment infrastructure stack.
    • It deploys a sample Lambda function in the PRIVATE_WITH_NAT subnet.
    • It deploys two IAM roles for testing:
      • allowedRole (default role): When the application uses this role, it is able to use the GET action to get the secret and open a connection to the Aurora MySQL database.
      • Not allowedRole: When the application uses this role, it is unable to use the GET action to get the secret and open a connection to the Aurora MySQL database.

Prerequisites to deploy the sample solution

The following software packages need to be installed in your development environment before you deploy this solution:

Note: In this section, we provide examples of AWS CLI commands and configuration for Linux or macOS operating systems. For instructions on using AWS CLI on Windows, refer to the AWS CLI documentation.

Before deployment, make sure that the correct AWS credentials are configured in your terminal session. The credentials can be either in the environment variables or in ~/.aws. For more details, see Configuring the AWS CLI.

Next, use the following commands to set your AWS credentials to deploy the stack:

export AWS_ACCESS_KEY_ID=<>
export AWS_SECRET_ACCESS_KEY=<>
export AWS_REGION = <>

You can view the IAM credentials that are being used by your session by running the command aws sts get-caller-identity. If you are running the cdk command for the first time in your AWS account, you will need to run the following cdk bootstrap command to provision a CDK Toolkit stack that will manage the resources necessary to enable deployment of cloud applications with the AWS CDK.

cdk bootstrap aws://<AWS account number>/<Region> # Bootstrap CDK in the specified account and AWS Region

Select the applicable archetype and deploy the solution

This section outlines the design and deployment steps for two archetypes:

Archetype 1: Application is currently on premises

Archetype 1 has the following requirements:

  • The application is currently hosted on premises.
  • The application would consume API keys, stored credentials, and other secrets in Secrets Manager.

The application, environment and security teams work together to define a tagging strategy that will be used to restrict access to secrets. After this, the proposed workflow for each persona is as follows:

  1. The environment engineer deploys a common environment infrastructure stack (as described earlier in this post) to bootstrap the AWS account with secrets and IAM policy by using the supplied tagging requirement.
  2. Additionally, the environment engineer deploys the IAM Roles Anywhere infrastructure stack.
  3. The application developer updates the secrets required by the application by using the client-side utility (helper.sh).
  4. The application developer uses the client-side utility to update the AWS CLI profile to consume the IAM Roles Anywhere role from the on-premises servers.

    Figure 1 shows the workflow for Archetype 1.

    Figure 1: Application on premises connecting to Secrets Manager

    Figure 1: Application on premises connecting to Secrets Manager

To deploy Archetype 1

  1. (Actions by the application team persona) Clone the repository and update the tagging details at configs/tagconfig.json.

    Note: Do not modify the tag/attributes name/key, only modify value.

  2. (Actions by the environment team persona) Run the following command to deploy the common environment infrastructure stack.
    ./helper.sh prepare
    Then, run the following command to deploy the IAM Roles Anywhere infrastructure stack../helper.sh on-prem
  3. (Actions by the application team persona) Update the secret value of the dummy secrets provided by the environment team, by using the following command.
    ./helper.sh update-secret

    Note: This command will only update the secret if it’s still using the dummy value.

    Then, run the following command to set up the client and server on premises../helper.sh client-profile-setup

    Follow the command prompt. It will help you request a client certificate and update the AWS CLI profile.

    Important: When you request a client certificate, make sure to supply at least one distinguished name, like CommonName.

The sample output should look like the following.


‐‐> This role can be used by the application by using the AWS CLI profile 'developer'.
‐‐> For instance, the following output illustrates how to access secret values by using the AWS CLI profile 'developer'.
‐‐> Sample AWS CLI: aws secretsmanager get-secret-value ‐‐secret-id $SECRET_ARN ‐‐profile developer

At this point, the client-side utility (helper.sh client-profile-setup) should have updated the AWS CLI configuration file with the following profile.

[profile developer]
region = <aws-region>
credential_process = /Users/<local-laptop-user>/.aws/aws_signing_helper credential-process
    ‐‐certificate /Users/<local-laptop-user>/.aws/client_cert.pem
    ‐‐private-key /Users/<local-laptop-user>/.aws/my_private_key.clear.key
    ‐‐trust-anchor-arn arn:aws:rolesanywhere:<aws-region>:444455556666:trust-anchor/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 
    ‐‐profile-arn arn:aws:rolesanywhere:<aws-region>:444455556666:profile/a1b2c3d4-5678-90ab-cdef-EXAMPLE22222 
    ‐‐role-arn arn:aws:iam::444455556666:role/RolesanywhereabacStack-onPremAppRole-1234567890ABC

To test Archetype 1 deployment

  • The application team can verify that the AWS CLI profile has been properly set up and is capable of retrieving secrets from Secrets Manager by running the following client-side utility command.
    ./helper.sh on-prem-test

This client-side utility (helper.sh) command verifies that the AWS CLI profile (for example, developer) has been set up for IAM Roles Anywhere and can run the GetSecretValue API action to retrieve the value of the secret stored in Secrets Manager.

The sample output should look like the following.

‐‐> Checking credentials ...
{
    "UserId": "AKIAIOSFODNN7EXAMPLE:EXAMPLE11111EXAMPLEEXAMPLE111111",
    "Account": "444455556666",
    "Arn": "arn:aws:sts::444455556666:assumed-role/RolesanywhereabacStack-onPremAppRole-1234567890ABC"
}
‐‐> Assume role worked for:
arn:aws:sts::444455556666:assumed-role/RolesanywhereabacStack-onPremAppRole-1234567890ABC
‐‐> This role can be used by the application by using the AWS CLI profile 'developer'. 
‐‐> For instance, the following output illustrates how to access secret values by using the AWS CLI profile 'developer'. 
‐‐> Sample AWS CLI: aws secretsmanager get-secret-value --secret-id $SECRET_ARN ‐‐profile $PROFILE_NAME
-------Output-------
{
  "password": "randomuniquepassword",
  "servertype": "testserver1",
  "username": "testuser1"
}
-------Output-------

Archetype 2: Application has migrated to AWS

Archetype 2 has the following requirement:

  • Deploy a sample application to demonstrate how ABAC authorization works for Secrets Manager APIs.

The application, environment, and security teams work together to define a tagging strategy that will be used to restrict access to secrets. After this, the proposed workflow for each persona is as follows:

  1. The environment engineer deploys a common environment infrastructure stack to bootstrap the AWS account with secrets and an IAM policy by using the supplied tagging requirement.
  2. The application developer updates the secrets required by the application by using the client-side utility (helper.sh).
  3. The application developer tests the sample application to confirm operability of ABAC.

Figure 2 shows the workflow for Archetype 2.

Figure 2: Sample migrated application connecting to Secrets Manager

Figure 2: Sample migrated application connecting to Secrets Manager

To deploy Archetype 2

  1. (Actions by the application team persona) Clone the repository and update the tagging details at configs/tagconfig.json.

    Note: Don’t modify the tag/attributes name/key, only modify value.

  2. (Actions by the environment team persona) Run the following command to deploy the common platform infrastructure stack.
    ./helper.sh prepare
  3. (Actions by the application team persona) Update the secret value of the dummy secrets provided by the environment team, using the following command.
    ./helper.sh update-secret

    Note: This command will only update the secret if it is still using the dummy value.

    Then, run the following command to deploy a sample app stack.
    ./helper.sh on-aws

    Note: If your secrets were migrated from a system that did not have the correct access controls, as a best security practice, you should rotate them at least once manually.

At this point, the client-side utility should have deployed a sample application Lambda function. This function connects to a MySQL database by using credentials stored in Secrets Manager. It retrieves the secret values, validates them, and establishes a connection to the database. The function returns a message that indicates whether the connection to the database is working or not.

To test Archetype 2 deployment

  • The application team can use the following client-side utility (helper.sh) to invoke the Lambda function and verify whether the connection is functional or not.
    ./helper.sh on-aws-test

The sample output should look like the following.

‐‐> Check if AWS CLI is installed
‐‐> AWS CLI found 
‐‐> Using tags to create Lambda function name and invoking a test 
‐‐> Checking the Lambda invoke response..... 
‐‐> The status code is 200
‐‐> Reading response from test function: 
"Connection to the DB is working."
‐‐> Response shows database connection is working from Lambda function using secret.

Conclusion

Building an effective secrets management solution requires careful planning and implementation. AWS Secrets Manager can help you effectively manage the lifecycle of your secrets at scale. We encourage you to take an iterative approach to building your secrets management solution, starting by focusing on core functional requirements like managing access, defining audit requirements, and building preventative and detective controls for secrets management. In future iterations, you can improve your solution by implementing more advanced functionalities like automatic rotation or resource policies for secrets.

To read Part 1 of this series, go to Migrating your secrets to AWS, Part I: Discovery and design.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Secrets Manager re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Adesh Gairola

Adesh Gairola

Adesh Gairola is a Senior Security Consultant at Amazon Web Services in Sydney, Australia. Adesh is eager to help customers build robust defenses, and design and implement security solutions that enable business transformations. He is always looking for new ways to help customers improve their security posture.

Eric Swamy

Eric Swamy

Eric is a Senior Security Consultant working in the Professional Services team in Sydney, Australia. He is passionate about helping customers build the confidence and technical capability to move their most sensitive workloads to cloud. When not at work, he loves to spend time with his family and friends outdoors, listen to music, and go on long walks.

Migrating your secrets to AWS Secrets Manager, Part I: Discovery and design

Post Syndicated from Eric Swamy original https://aws.amazon.com/blogs/security/migrating-your-secrets-to-aws-secrets-manager-part-i-discovery-and-design/

“An ounce of prevention is worth a pound of cure.” – Benjamin Franklin

A secret can be defined as sensitive information that is not intended to be known or disclosed to unauthorized individuals, entities, or processes. Secrets like API keys, passwords, and SSH keys provide access to confidential systems and resources, but it can be a challenge for organizations to maintain secure and consistent management of these secrets. Commonly observed anti-patterns in organizational secrets management systems include sharing plaintext secrets in emails or messaging apps, allowing application developers to view secrets in plaintext, hard-coding secrets into applications and storing them in version control systems, failing to rotate secrets regularly, and not logging and monitoring access to secrets.

We have created a two-part Amazon Web Services (AWS) blog post that provides prescriptive guidance on how you can use AWS Secrets Manager to help you achieve a cloud-based and modern secrets management system. In this first blog post, we discuss approaches to discover and classify secrets. In Part 2 of this series, we elaborate on the implementation phase and discuss migration techniques that will help you migrate your secrets to AWS Secrets Manager.

Managing secrets: Best practices and personas

A secret’s lifecycle comprises four phases: create, store, use, and destroy. An effective secrets management solution protects the secret in each of these phases from unauthorized access. Besides being secure, robust, scalable, and highly available, the secrets management system should integrate closely with other tools, solutions, and services that are being used within the organization. Legacy secret stores may lack integration with privileged access management (PAM), logging and monitoring, DevOps, configuration management, and encryption and auditing, which leads to teams not having uniform practices for consuming secrets and creates discrepancies from organizational policies.

Secrets Manager is a secrets management service that helps you protect access to your applications, services, and IT resources. This is a non-exhaustive list of features that AWS Secrets Manager offers:

  • Access control through AWS Identity and Access Management (IAM) — Secrets Manager offers built-in integration with the AWS Identity and Access Management (IAM) service. You can attach access control policies to IAM principals or to secrets themselves (by using resource-based policies).
  • Logging and monitoring — Secrets Manager integrates with AWS logging and monitoring services such as AWS CloudTrail and Amazon CloudWatch. This means that you can use your existing AWS logging and monitoring stack to log access to secrets and audit their usage.
  • Integration with other AWS services — Secrets Manager can store and manage the lifecycle of secrets created by other AWS services like Amazon Relational Database Service (Amazon RDS), Amazon Redshift, and Amazon QuickSight. AWS is constantly working on integrating more services with Secrets Manager.
  • Secrets encryption at rest — Secrets Manager integrates with AWS Key Management Service (AWS KMS). Secrets are encrypted at rest by using an AWS-managed key or customer-managed key.
  • Framework to support the rotation of secrets securely — Rotation helps limit the scope of a compromise and should be an integral part of a modern approach to secrets management. You can use Secrets Manager to schedule automatic database credentials rotation for Amazon RDS, Amazon Redshift, and Amazon DocumentDB. You can use customized AWS Lambda functions to extend the Secrets Manager rotation feature to other secret types, such as API keys and OAuth tokens for on-premises and cloud resources.

Security, cloud, and application teams within an organization need to work together cohesively to build an effective secrets management solution. Each of these teams has unique perspectives and responsibilities when it comes to building an effective secrets management solution, as shown in the following table.

Persona Responsibilities What they want What they don’t want
Security teams/security architect Define control objectives and requirements from the secrets management system Least privileged short-lived access, logging and monitoring, and rotation of secrets Secrets sprawl
Cloud team/environment team Implement controls, create guardrails, detect events of interest Scalable, robust, and highly available secrets management infrastructure Application teams reaching out to them to provision or manage app secrets
Developer/migration engineer Migrate applications and their secrets to the cloud Independent control and management of their app secrets Dependency on external teams

To sum up the requirements from all the personas mentioned here: The approach to provision and consume secrets should be secure, governed, easily scalable, and self-service.

We’ll now discuss how to discover and classify secrets and design the migration in a way that helps you to meet these varied requirements.

Discovery — Assess and categorize existing secrets

The initial discovery phase involves running sessions aimed at discovering, assessing, and categorizing secrets. Migrating applications and associated infrastructure to the cloud requires a strategic and methodical approach to progressively discover and analyze IT assets. This analysis can be used to create high-confidence migration wave plans. You should treat secrets as IT assets and include them in the migration assessment planning.

For application-related secrets, arguably the most appropriate time to migrate a secret is when the application that uses the secret is being migrated itself. This lets you track and report the use of secrets as soon as the application begins to operate in the cloud. If secrets are left on-premises during an application migration, this often creates a risk to the availability of the application. The migrated application ends up having a dependency on the connectivity and availability of the on-premises secrets management system.

The activities performed in this phase are often handled by multiple teams. Depending on the purpose of the secret, this can be a mix of application developers, migration teams, and environment teams.

Following are some common secret types you might come across while migrating applications.

Type Description
Application secrets Secrets specific to an application
Client credentials Cloud to on-premises credentials or OAuth tokens (such as Okta, Google APIs, and so on)
Database credentials Credentials for cloud-hosted databases, for example, Amazon Redshift, Amazon RDS or Amazon Aurora, Amazon DocumentDB
Third-party credentials Vendor application credentials or API keys
Certificate private keys Custom applications or infrastructure that might require programmatic access to the private key
Cryptographic keys Cryptographic keys used for data encryption or digital signatures
SSH keys Centralized management of SSH keys can potentially make it easier to rotate, update, and track keys
AWS access keys On-premises to cloud credentials (IAM)

Creating an inventory for secrets becomes simpler when organizations have an IT asset management (ITAM) or Identity and Access Management (IAM) tool to manage their IT assets (such as secrets) effectively. For organizations that don’t have an on-premises secrets management system, creating an inventory of secrets is a combination of manual and automated efforts. Application subject matter experts (SMEs) should be engaged to find the location of secrets that the application uses. In addition, you can use commercial tools to scan endpoints and source code and detect secrets that might be hardcoded in the application. Amazon CodeGuru is a service that can detect secrets in code. It also provides an option to migrate these secrets to Secrets Manager.

AWS has previously described seven common migration strategies for moving applications to the cloud. These strategies are refactor, replatform, repurchase, rehost, relocate, retain, and retire. For the purposes of migrating secrets, we recommend condensing these seven strategies into three: retire, retain, and relocate. You should evaluate every secret that is being considered for migration against a decision tree to determine which of these three strategies to use. The decision tree evaluates each secret against key business drivers like cost reduction, risk appetite, and the need to innovate. This allows teams to assess if a secret can be replaced by native AWS services, needs to be retained on-premises, migrated to Secrets Manager, or retired. Figure 1 shows this decision process.

Figure 1: Decision tree for assessing a secret for migration

Figure 1: Decision tree for assessing a secret for migration

Capture the associated details for secrets that are marked as RELOCATE. This information is essential and must remain confidential. Some secret metadata is transitive and can be derived from related assets, including details such as itsm-tier, sensitivity-rating, cost-center, deployment pipeline, and repository name. With Secrets Manager, you will use resource tags to bind this metadata with the secret.

You should gather at least the following information for the secrets that you plan to relocate and migrate to AWS Secrets Manager.

Metadata about secrets Rationale for gathering data
Secrets team name or owner Gathering the name or email address of the individual or team responsible for managing secrets can aid in verifying that they are maintained and updated correctly.
Secrets application name or ID To keep track of which applications use which secrets, it is helpful to collect application details that are associated with these secrets.
Secrets environment name or ID Gathering information about the environment to which secrets belong, such as “prod,” “dev,” or “test,” can assist in the efficient management and organization of your secrets.
Secrets data classification Understanding your organization’s data classification policy can help you identify secrets that contain sensitive or confidential information. It is recommended to handle these secrets with extra care. This information, which may be labeled “confidential,” “proprietary,” or “personally identifiable information (PII),” can indicate the level of sensitivity associated with a particular secret according to your organization’s data classification policy or standard.
Secrets function or usage If you want to quickly find the secrets you need for a specific task or project, consider documenting their usage. For example, you can document secrets related to “backup,” “database,” “authentication,” or “third-party integration.” This approach can allow you to identify and retrieve the necessary secrets within your infrastructure without spending a lot of time searching for them.

This is also a good time to decide on the rotation strategy for each secret. When you rotate a secret, you update the credentials in both Secrets Manager and the service to which that secret provides access (in other words, the resource). Secrets Manager supports automatic rotation of secrets based on a schedule.

Design the migration solution

In this phase, security and environment teams work together to onboard the Secrets Manager service to their organization’s cloud environment. This involves defining access controls, guardrails, and logging capabilities so that the service can be consumed in a regulated and governed manner.

As a starting point, use the following design principles mentioned in the Security Pillar of the AWS Well Architected Framework to design a migration solution:

  • Implement a strong identity foundation
  • Enable traceability
  • Apply security at all layers
  • Automate security best practices
  • Protect data at rest and in transit
  • Keep people away from data
  • Prepare for security events

The design considerations covered in the rest of this section will help you prepare your AWS environment to host production-grade secrets. This phase can be run in parallel with the discovery phase.

Design your access control system to establish a strong identity foundation

In this phase, you define and implement the strategy to restrict access to secrets stored in Secrets Manager. You can use the AWS Identity and Access Management (IAM) service to specify that identities (human and non-human IAM principals) are only able to access and manage secrets that they own. Organizations that organize their workloads and environments by using separate AWS accounts should consider using a combination of role-based access control (RBAC) and attribute-based access control (ABAC) to restrict access to secrets depending on the granularity of access that’s required.

You can use a scalable automation to deploy and update key IAM roles and policies, including the following:

  • Pipeline deployment policies and roles — This refers to IAM roles for CICD pipelines. These pipelines should be the primary mechanism for creating, updating, and deleting secrets in the organization.
  • IAM Identity Center permission sets — These allow human identities access to the Secrets Manager API. We recommend that you provision secrets by using infrastructure as code (IaC). However, there are instances where users need to interact directly with the service. This can be for initial testing, troubleshooting purposes, or updating a secret value when automatic rotation fails or is not enabled.
  • IAM permissions boundary — Boundary policies allow application teams to create IAM roles in a self-serviced, governed, and regulated manner.

Most organizations have Infrastructure, DevOps, or Security teams that deploy baseline configurations into AWS accounts. These solutions help these teams govern the AWS account and often have their own secrets. IAM policies should be created such that the IAM principals created by the application teams are unable to access secrets that are owned by the environment team, and vice versa. To enforce this logical boundary, you can use tagging and naming conventions on your secrets by using IAM.

A sample scheme for tagging your secrets can look like the following.

Tag key Tag value Notes Policy elements Secret tags
appname
  • Lowercase
  • Alphanumeric only
  • User friendly
  • Quickly identifiable
A user-friendly name for the application PrincipalTag/ appname =<value> (applies to role)
RequestTag/ appname =<value> (applies to caller)
SecretManager:ResourceTag/ appname=<value> (applies to the secret)
appname:<value>
appid
  • Lowercase
  • Alphanumeric only
  • Unique across the organization
  • Fixed length (5–7 characters)
Uniquely identifies the application among other cloud-hosted apps PrincipalTag/appid=<value>
RequestTag/appid=<value>
SecretManager:ResourceTag/appid=<value>
appid:<value>
appfunc
  • Lowercase
  • Fixed values (for example, web, msg, dba, api, storage, container, middleware, tool, service)
Used to describe the function of a particular target that the secret material is associated with (for example, web server, message broker, database) PrincipalTag/appfunc=<value>
RequestTag/appfunc=<value>
SecretManager:ResourceTag/appfunc=<value>
Appfunc:<value>
appenv
  • Lowercase
  • Fixed values (for example, dev, test, nonp, prod)
An identifier for the secret usage environment PrincipalTag/appenv=<value>
RequestTag/appenv=<value>
SecretManager:ResourceTag/appenv=<value>
appenv:<value>
dataclassification
  • Lowercase
  • Fixed values (for example, protected, confidential)
Use your organization’s data classification standards to classify the secrets PrincipalTag/dataclassification=<value>
RequestTag/dataclassification=<value>
SecretManager:ResourceTag/dataclassification=<value>
Dataclassification:<value>

If you maintain a registry that documents details of your cloud-hosted applications, most of these tags can be derived from the registry.

It’s common to apply different security and operational policies for the non-production and production environments of a given workload. Although production environments are generally deployed in a dedicated account, it’s common to have less critical non-production apps and environments coexisting in the same AWS account. For operation and governance at scale in these multi-tenanted accounts, you can use attribute-based access control (ABAC) to manage secure access to secrets. ABAC enables you to grant permissions based on tags. The main benefits of using tag-based access control are its scalability and operational efficiency.

Figure 2 shows an example of ABAC in action, where an IAM policy allows access to a secret only if the appfunc, appenv, and appid tags on the secret match the tags on the IAM principal that is trying to access the secrets.

Figure 2: ABAC access control

Figure 2: ABAC access control

ABAC works as follows:

  • Tags on a resource define who can access the resource. It is therefore important that resources are tagged upon creation.
  • For a create secret operation, IAM verifies whether the Principal tags on the IAM identity that is making the API call match the request tags in the request.
  • For an update, delete, or read operation, IAM verifies that the Principal tags on the IAM identity that is making the API call match the resource tags on the secret.
  • Regardless of the number of workloads or environments that coexist in the same account, you only need to create one ABAC-based IAM policy. This policy is the same for different kinds of accounts and can be deployed by using a capability like AWS CloudFormation StackSets. This is the reason that ABAC scales well for scenarios where multiple applications and environments are deployed in the same AWS account.
  • IAM roles can use a common IAM policy, such as the one described in the previous bullet point. You need to verify that the roles have the correct tags set on them, according to your tagging convention. This will automatically grant the roles access to the secrets that have the same resource tags.
  • Note that with this approach, tagging secrets and IAM roles becomes the most critical component for controlling access. For this reason, all tags on IAM roles and secrets on Secrets Manager must follow a standard naming convention at all times.

The following is an ABAC-based IAM policy that allows creation, updates, and deletion of secrets based on the tagging scheme described in the preceding table.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Condition": {
                "StringEquals": {
                    "secretsmanager:ResourceTag/appfunc": "${aws:PrincipalTag/appfunc}",
                    "secretsmanager:ResourceTag/appenv": "${aws:PrincipalTag/appenv}",
                    "secretsmanager:ResourceTag/name": "${aws:PrincipalTag/name}",
                    "secretsmanager:ResourceTag/appid": "${aws:PrincipalTag/appid}"
                }
            },
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:PutSecretValue",
                "secretsmanager:UpdateSecret",
                "secretsmanager:DeleteSecret"
            ],
            "Resource": "arn:aws:secretsmanager:ap-southeast-2:*:secret:${aws:PrincipalTag/name}/${aws:PrincipalTag/appid}/${aws:PrincipalTag/appfunc}/${aws:PrincipalTag/appenv}*",
            "Effect": "Allow",
            "Sid": "AccessBasedOnResourceTags"
        },
        {
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/appfunc": "${aws:PrincipalTag/appfunc}",
                    "aws:RequestTag/appid": "${aws:PrincipalTag/appid}",
                    "aws:RequestTag/name": "${aws:PrincipalTag/name}",
                    "aws:RequestTag/appenv": "${aws:PrincipalTag/appenv}"
                }
            },
            "Action": [
                "secretsmanager:TagResource",
                "secretsmanager:CreateSecret"
            ],
            "Resource": "arn:aws:secretsmanager:ap-southeast-2:*:secret:${aws:PrincipalTag/name}/${aws:PrincipalTag/appid}/${aws:PrincipalTag/appfunc}/${aws:PrincipalTag/appenv}*",
            "Effect": "Allow",
            "Sid": "AccessBasedOnRequestTags"
        }
    ]
}

In addition to controlling access, this policy also enforces a naming convention. IAM principals will only be able to create a secret that matches the following naming scheme.

Secret name = value of tag-key (appid + appfunc + appenv + name)
For example, /ordersapp/api/prod/logisticsapi

You can choose to implement ABAC so that the resource name matches the principal tags or the resource tags match the principal tags, or both. These are just different types of ABAC. The sample policy provided here implements both types. It’s important to note that because ABAC-based IAM policies are shared across multiple workloads, potential misconfigurations in the policies will have a wider scope of impact.

For more information about building your ABAC strategy, refer to the blog post Working backward: From IAM policies and principal tags to standardized names and tags for your AWS resources.

You can also add checks in your pipeline to provide early feedback for developers. These checks may potentially assist in verifying whether appropriate tags have been set up in IaC resources prior to their creation. Your pipeline-based controls provide an additional layer of defense and complement or extend restrictions enforced by IAM policies.

Resource-based policies

Resource-based policies are a flexible and powerful mechanism to control access to secrets. They are directly associated with a secret and allow specific principals mentioned in the policy to have access to the secret. You can use these policies to grant identities (internal or external to the account) access to a secret.

If your organization uses resource policies, security teams should come up with control objectives for these policies. Controls should be set so that only resource-based policies meeting your organizations requirements are created. Control objectives for resource policies may be set as follows:

  • Allow statements in the policy to have allow access to the secret from the same application.
  • Allow statements in the policy to have allow access from organization-owned cross-account identities only if they belong to the same environment. Controls that meet these objectives can be preventative (checks in pipeline) or responsive (config rules and Amazon EventBridge invoked Lambda functions).

Environment teams can also choose to provision resource-based policies for application teams. The provision process can be manual, but is preferably automated. An example would be that these teams can allow application teams to tag secrets with specific values, like a cross-account IAM role Amazon Resource Number (ARN) that needs access. An automation invoked by EventBridge rules then asserts that the cross-account principal in the tag belongs to the organization and is in the same environment, and then provisions a resource-based policy for the application team. Using such mechanisms creates a self-service way for teams to create safe resource policies that meet common use cases.

Resource-based policies for Secrets Manager can be a helpful tool for controlling access to secrets, but it is important to consider specific situations where alternative access control mechanisms might be more appropriate. For example, if your access control requirements for secrets involve complex conditions or dependencies that cannot be easily expressed using the resource-based policy syntax, it may be challenging to manage and maintain the policies effectively. In such cases, you may want to consider using a different access control mechanism that better aligns with your requirements. For help determining which type of policy to use, see Identity-based policies and resource-based policies.

Design detective controls to achieve traceability, monitoring, and alerting

Prepare your environment to record and flag events of interest when Secrets Manager is used to store and update secrets. We recommend that you start by identifying risks and then formulate objectives and devise control measures for each identified risk, as follows:

  • Control objectives — What does the control evaluate, and how is it configured? Controls can be configured by using CloudTrail events invoked by Lambda functions, AWS config rules, or CloudWatch alarms. Controls can evaluate a misconfigured property in a secrets resource or report on an event of interest.
  • Target audience — Identify teams that should be notified if the event occurs. This can be a combination of the environment, security, and application teams.
  • Notification type — SNS, email, Slack channel notifications, or an ITIL ticket.
  • Criticality — Low, medium, or high, based on the criticality of the event.

The following is a sample matrix that can serve as a starting point for documenting detective controls for Secrets Manager. The column titled AWS services in the table offers some suggestions for implementation to help you meet your control objetves.

Risk Control objective Criticality AWS services
A secret is created without tags that match naming and tagging schemes
  • Enforce least privilege
  • Establish logging and monitoring
  • Manage secrets
HIGH (if using ABAC) CloudTrail invoked Lambda function or custom AWS config rule
IAM related tags on a secret are updated, removed
  • Manage secrets
  • Enforce least privilege
HIGH (if using ABAC) CloudTrail invoked Lambda function or custom config rule
A resource policy is created when resource policies have not been onboarded to the environment
  • Manage secrets
  • Enforce least privilege
HIGH Pipeline or CloudTrail invoked ¬Lambda function or custom config rule
A secret is marked for deletion from an unusual source — root user or admin break glass role
  • Improve availability
  • Protect configurations
  • Prepare for incident response
  • Manage secrets
HIGH CloudTrail invoked Lambda function
A non-compliant resource policy was created — for example, to provide secret access to a foreign account
  • Enforce least privilege
  • Manage secrets
HIGH CloudTrail invoked Lambda function or custom config rule
An AWS KMS key for secrets encryption is marked for deletion
  • Manage secrets
  • Protect configurations
HIGH CloudTrail invoked Lambda function
A secret rotation failed
  • Manage secrets
  • Improve availability
MEDIUM Managed config rule
A secret is inactive and is not being accessed for x number of days
  • Optimize costs
LOW Managed config rule
Secrets are created that do not use KMS key
  • Encrypt data at rest
LOW Managed config rule
Automatic rotation is not enabled
  • Manage secrets
LOW Managed config rule
Successful create, update, and read events for secrets
  • Establish logging and monitoring
LOW CloudTrail logs

We suggest that you deploy these controls in your AWS accounts by using a scalable mechanism, such as CloudFormation StackSets.

For more details, see the following topics:

Design for additional protection at the network layer

You can use the guiding principles for Zero Trust networking to add additional mechanisms to control access to secrets. The best security doesn’t come from making a binary choice between identity-centric and network-centric controls, but by using both effectively in combination with each other.

VPC endpoints allow you to provide a private connection between your VPC and Secrets Manager API endpoints. They also provide the ability to attach a policy that allows you to enforce identity-centric rules at a logical network boundary. You can use global context keys like aws:PrincipalOrgID in VPC endpoint policies to allow requests to Secrets Manager service only from identities that belong to the same AWS organization. You can also use aws:sourceVpce and aws:sourceVpc IAM conditions to allow access to the secret only if the request originates from a specific VPC endpoint or VPC, respectively.

For more details on VPC endpoints, see Using an AWS Secrets Manager VPC endpoint.

Design for least privileged access to encryption keys

To reduce unauthorized access, secrets should be encrypted at rest. Secrets Manager integrates with AWS KMS and uses envelope encryption. Every secret in Secrets Manager is encrypted with a unique data key. Each data key is protected by a KMS key. Whenever the secret value inside a secret changes, Secrets Manager generates a new data key to protect it. The data key is encrypted under a KMS key and stored in the metadata of the secret. To decrypt the secret, Secrets Manager first decrypts the encrypted data key by using the KMS key in AWS KMS.

The following is a sample AWS KMS policy that permits cryptographic operations to a KMS key only from the Secrets Manager service within an AWS account, and allows the AWS KMS decrypt action from a specific IAM principal throughout the organization.

{
    "Version": "2012-10-17",
    "Id": "secrets_manager_encrypt_org",
    "Statement": [
        {
            "Sid": "Root Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::444455556666:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow access for Key Administrators",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
             "arn:aws:iam::444455556666:role/platformRoles/KMS-key-admin-role",                    "arn:aws:iam::444455556666:role/platformRoles/KMS-key-automation-role"
                ]
            },
            "Action": [
                "kms:CancelKeyDeletion",
                "kms:Create*",
                "kms:Delete*",
                "kms:Describe*",
                "kms:Disable*",
                "kms:Enable*",
                "kms:Get*",
                "kms:List*",
                "kms:Put*",
                "kms:Revoke*",
                "kms:ScheduleKeyDeletion",
                "kms:TagResource",
                "kms:UntagResource",
                "kms:Update*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow Secrets Manager use of the KMS key for a specific account",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:DescribeKey"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "kms:CallerAccount": "444455556666",
                    "kms:ViaService": "secretsmanager.us-east-1.amazonaws.com"
                }
            }
        },
        {
            "Sid": "Allow use of Secrets Manager secrets from a specific IAM role (service account) throughout your org",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "kms:Decrypt",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "o-exampleorgid"
                },
                "StringLike": {
                    "aws:PrincipalArn": "arn:aws:iam::*:role/platformRoles/secretsAccessRole"
                }
            }
        }
    ]
}

Additionally, you can use the secretsmanager:KmsKeyId IAM condition key to allow secrets creation only when AWS KMS encryption is enabled for the secret. You can also add checks in your pipeline that allow the creation of a secret only when a KMS key is associated with the secret.

Design or update applications for efficient retrieval of secrets

In applications, you can retrieve your secrets by calling the GetSecretValue function in the available AWS SDKs. However, we recommend that you cache your secret values by using client-side caching. Caching secrets can improve speed, help to prevent throttling by limiting calls to the service, and potentially reduce your costs.

Secrets Manager integrates with the following AWS services to provide efficient retrieval of secrets:

  • For Amazon RDS, you can integrate with Secrets Manager to simplify managing master user passwords for Amazon RDS database instances. Amazon RDS can manage the master user password and stores it securely in Secrets Manager, which may eliminate the need for custom AWS Lambda functions to manage password rotations. The integration can help you secure your database by encrypting the secrets, using your own managed key or an AWS KMS key provided by Secrets Manager. As a result, the master user password is not visible in plaintext during the database creation workflow. This feature is available for the Amazon RDS and Aurora engines, and more information can be found in the Amazon RDS and Aurora User Guides.
  • For Amazon Elastic Kubernetes Service (Amazon EKS), you can use the AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver. This open-source project enables you to mount Secrets Manager secrets as Kubernetes secrets. The driver translates Kubernetes secret objects into Secrets Manager API calls, allowing you to access and manage secrets from within Kubernetes. After you configure the Kubernetes Secrets Store CSI Driver, you can create Kubernetes secrets backed by Secrets Manager secrets. These secrets are securely stored in Secrets Manager and can be accessed by your applications that are running in Amazon EKS.
  • For Amazon Elastic Container Service (Amazon ECS), sensitive data can be securely stored in Secrets Manager secrets and then accessed by your containers through environment variables or as part of the log configuration. This allows for a simple and potentially safe injection of sensitive data into your containers, making it a possible solution for your needs.
  • For AWS Lambda, you can use the AWS Parameters and Secrets Lambda Extension to retrieve and cache Secrets Manager secrets in Lambda functions without the need for an AWS SDK. It is noteworthy that retrieving a cached secret is faster compared to the standard method of retrieving secrets from Secrets Manager. Moreover, using a cache can be cost-efficient, because there is a charge for calling Secrets Manager APIs. For more details, see the Secrets Manager User Guide.

For additional information on how to use Secrets Manager secrets with AWS services, refer to the following resources:

Develop an incident response plan for security events

It is recommended that you prepare for unforeseeable incidents such as unauthorized access to your secrets. Developing an incident response plan can help minimize the impact of the security event, facilitate a prompt and effective response, and may help to protect your organization’s assets and reputation. The traceability and monitoring controls we discussed in the previous section can be used both during and after the incident.

The Computer Security Incident Handling Guide SP 800-61 Rev. 2, which was created by the National Institute of Standards and Technology (NIST), can help you create an incident response plan for specific incident types. It provides a thorough and organized approach to incident response, covering everything from initial preparation and planning to detection and analysis, containment, eradication, recovery, and follow-up. The framework emphasizes the importance of continual improvement and learning from past incidents to enhance the overall security posture of the organization.

Refer to the following documentation for further details and sample playbooks:

Conclusion

In this post, we discussed how organizations can take a phased approach to migrate their secrets to AWS Secrets Manager. Your teams can use the thought exercises mentioned in this post to decide if they would like to rehost, replatform, or retire secrets. We discussed what guardrails should be enabled for application teams to consume secrets in a safe and regulated manner. We also touched upon ways organizations can discover and classify their secrets.

In Part 2 of this series, we go into the details of the migration implementation phase and walk you through a sample solution that you can use to integrate on-premises applications with Secrets Manager.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Secrets Manager re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Eric Swamy

Eric Swamy

Eric is a Senior Security Consultant working in the Professional Services team in Sydney, Australia. He is passionate about helping customers build the confidence and technical capability to move their most sensitive workloads to cloud. When not at work, he loves to spend time with his family and friends outdoors, listen to music, and go on long walks.

Adesh Gairola

Adesh Gairola

Adesh Gairola is a Senior Security Consultant at Amazon Web Services in Sydney, Australia. Adesh is eager to help customers build robust defenses, and design and implement security solutions that enable business transformations. He is always looking for new ways to help customers improve their security posture.

[$] Exceptions in BPF

Post Syndicated from original https://lwn.net/Articles/938435/

The BPF virtual machine in the kernel has been steadily gaining new
features for years, many of which add capabilities that C programmers do
not ordinarily have. So, from one point of view, it was only a matter of
time before BPF gained support for exceptions. As it turns out, though,
this “exceptions” feature is aimed at a specific use case, and its use in
most programs will be truly exceptional.

Temporal data lake architecture for benchmark and indices analytics

Post Syndicated from Krishna Gogineni original https://aws.amazon.com/blogs/architecture/temporal-data-lake-architecture-for-benchmark-and-indices-analytics/

Financial trading houses and stock exchanges generate enormous volumes of data in near real-time, making it difficult to perform bi-temporal calculations that yield accurate results. Achieving this requires a processing architecture that can handle large volumes of data during peak bursts, meet strict latency requirements, and scale according to incoming volumes.

In this post, we’ll describe a scenario for an industry leader in the financial services sector and explain how AWS services are used for bi-temporal processing with state management and scale based on variable workloads during the day, all while meeting strict service-level agreement (SLA) requirements.

Problem statement

To design and implement a fully temporal transactional data lake with the repeatable read isolation level for queries is a challenge, particularly with burst events that need the overall architecture to scale accordingly. The data store in the overall architecture needs to record the value history of data at different times, which is especially important for financial data. Financial data can include corporate actions, annual or quarterly reports, or fixed-income securities, like bonds that have variable rates. It’s crucial to be able to correct data inaccuracies during the reporting period.

The example customer seeks a data processing platform architecture to dynamically scale based on the workloads with a capacity of processing 150 million records under 5 minutes. Their platform should be capable of meeting the end-to-end SLA of 15 minutes, from ingestion to reporting, with lowest total cost of ownership. Additionally, managing bi-temporal data requires a database that has critical features, such as ACID (atomicity, consistency, isolation, durability) compliance, time-travel capability, full-schema evolution, partition layout and evolution, rollback to prior versions, and SQL-like query experience.

Solution overview

The solution architecture key building blocks are Amazon Kinesis Data Streams for streaming data, Amazon Kinesis Data Analytics with Apache Flink as processing engine, Flink’s RocksDB for state management, and Apache Iceberg on Amazon Simple Storage Service (Amazon S3) as the storage engine (Figure 1).

End-to-end data-processing architecture

Figure 1. End-to-end data-processing architecture

Data processing

Here’s how it works:

  • A publisher application receives the data from the source systems and publishes data into Kinesis Data Streams using a well-defined JSON format structure.
  • Kinesis Data Streams holds the data for a duration that is configurable so data is not lost and can auto scale based on the data volume ingested.
  • Kinesis Data Analytics runs an Apache Flink application, with state management (RocksDB), to handle bi-temporal calculations. The Apache Flink application consumes data from Kinesis Data Streams and performs the following computations:
    • Transforms the JSON stream into a row-type record, compatible with a SQL table-like structure, resolving nesting and parent–child relationships present within the stream
    • Checks whether the record has already an existing state in in-memory RocksDB or disk attached to Kinesis Data Analytics computational node to avoid read latency from the database, which is critical for meeting the performance requirements
    • Performs bi-temporal calculations and creates the resultant records in an in-memory data structure before invoking the Apache Iceberg sink operator
    • The Apache Flink application sink operator appends the temporal states, expressed as records into existing Apache Iceberg data store. This will comply with key principles of time series data, which is immutable, and the ability to time-travel along with ACID compliance, schema evolution, and partition evolution
  • Kinesis Data Analytics is resilient and provides a no-data-loss capability, with features like periodic checkpoints and savepoints. They are used to store the state management in a secure Amazon S3 location that can be accessed outside of Kinesis Data Analytics. This savepoints mechanism can be used to programmatically to scale the cluster size based on the workloads using time-driven scheduling and AWS Lambda functions.
  • If the time-to-live feature of RocksDB is implemented, old records are stored in Apache Iceberg on Amazon S3. When performing temporal calculations, if the state is not found in memory, data is read from Apache Iceberg into RocksDB and the processing is completed. However, this step is optional and can be circumvented if the Kinesis Data Analytics cluster is initialized with right number of Kinesis processing units to hold the historical information, as per requirements.
  • Because the data is stored in an Apache Iceberg table format in Amazon S3, data is queried using Trino, which supports Apache Iceberg table format.
  • The end user queries data using any SQL tool that supports the Trino query engine.

Apache Iceberg maintenance jobs, such as data compaction, expire snapshot, delete orphan files, can be launched using Amazon Athena to optimize performance out of Apache Iceberg data store. Details of each processing step performed in Apache Flink application are captured using Amazon CloudWatch, which logs all the events.

Scalability

Amazon EventBridge scheduler invokes a Lambda function to scale the Kinesis Data Analytics. Kinesis Data Analytics has a short outage during rescaling that is proportional to the amount of data stored in RocksDB, which is why a state management strategy is necessary for the proper operation of the system.

Figure 2 shows the scaling process, which depicts:

  • Before peak load: The Kinesis Data Analytics cluster is processing off-peak records with minimum configuration before the peak load. A scheduled event is launched from EventBridge that invokes a Lambda function, which shuts down the cluster using the savepoint mechanism and scales up the Kinesis Data Analytics cluster to required Kinesis processing units.
  • During peak load: When the peak data burst happens, the Kinesis Data Analytics cluster is ready to handle the volume of data from Kinesis Data Stream, and processes it within the SLA of 5 minutes.
  • After peak load: A scheduled event from EventBridge invokes a Lambda function to scale down the Kinesis Data Analytics cluster to the minimum configuration that holds the required state for the entire volume of records.
Cluster scaling before, during, and after peak data volume processing

Figure 2. Cluster scaling before, during, and after peak data volume processing

Performance insights

With the discussed architecture, we want to demonstrate that the we are able to meet the SLAs, in terms of performance and processing times. We have taken a subset of benchmarks and indices data and processed the same with the end-to-end architecture. During the process, we observed some very interesting findings, which we would like to share.

Processing time for Apache Iceberg Upsert vs Append operations: During our tests, we expected Upsert operation to be faster than append. But on the contrary, we noticed that Append operations were faster compared to Upsert even though more computations are performed in the Apache Flink application. In our test with 3,500,000 records, Append operation took 1556 seconds while Upsert took 1675 seconds to process the data (Figure 3).

Processing times for Upsert vs. Append

Figure 3. Processing times for Upsert vs. Append

Compute consumption for Apache Iceberg Upsert vs. Append operations: Comparing the compute consumption for 10,000,000 records, we noticed that Append operation was able to process the data in the same amount of time as Upsert operation but with less compute resources. In our tests, we have noted that Append operation only consumed 64 Kinesis processing units, whereas Upsert consumed 78 Kinesis processing units (Figure 4).

Comparing consumption for Upsert vs. Append

Figure 4. Comparing consumption for Upsert vs. Append

Scalability vs performance: To achieve the desired data processing performance, we need a specific configuration of Kinesis processing units, Kinesis Data Streams, and Iceberg parallelism. In our test with the data that we chose, we started with four Kinesis processing units and four Kinesis data streams for data processing. We observed an 80% performance improvement in data processing with 16 Kinesis data processing units. An additional 6% performance improvement was demonstrated when we scaled to 32 Kinesis processing units. When we increased the Kinesis data streams to 16, we observed an additional 2% performance improvement (Figure 5).

Scalability vs. performance

Figure 5. Scalability vs. performance

Data volume processing times for Upsert vs. Append: For this test, we started with 350,000 records of data. When we increased data volume to 3.5M records, we observed that Append performing better than Upsert, demonstrating a five-fold increase in processing time (Figure 6).

Data volume processing times for Upsert vs. Append

Figure 6. Data volume processing times for Upsert vs. Append

Conclusion

The architecture we explored today scales based on the data-volume requirements of the customer and is capable of meeting the end-to-end SLA of 15 minutes, with a potential lowered total cost of ownership. Additionally, the solution is capable of handling high-volume, bi-temporal computations with ACID compliance, time travel, full-schema evolution, partition layout evolution, rollback to prior versions and SQL-like query experience.

Further reading

AWS re:Inforce 2023: Key announcements and session highlights

Post Syndicated from Nisha Amthul original https://aws.amazon.com/blogs/security/aws-reinforce-2023-key-announcements-and-session-highlights/

AWS re:Inforce

Thank you to everyone who participated in AWS re:Inforce 2023, both virtually and in-person. The conference featured a lineup of over 250 engaging sessions and hands-on labs, in collaboration with more than 80 AWS partner sponsors, over two days of immersive cloud security learning. The keynote was delivered by CJ Moses, AWS Chief Information Security Officer, Becky Weiss, AWS Senior Principal Engineer, and Debbie Wheeler, Delta Air Lines Chief Information Security Officer. They shared the latest innovations in cloud security from AWS and provided insights on how to foster a culture of security in your organization.

If you couldn’t join us or would like to revisit the insightful themes discussed, we’ve put together this blog post for you. It provides a comprehensive summary of all the key announcements made and includes information on where you can watch the keynote and sessions at your convenience.

Key announcements

Here are some of the top announcements that we made at AWS re:Inforce 2023:

  • Amazon Verified PermissionsVerified Permissions is a scalable permissions management and fine-grained authorization service for the applications you build. The service helps your developers build secure applications faster by externalizing authorization and centralizing policy management and administration. Developers can align their application access with Zero Trust principles by implementing least privilege and continual verification within applications. Security and audit teams can better analyze and audit who has access to what within applications. Amazon Verified Permissions uses Cedar, an open-source policy language for access control that empowers developers and admins to define policy-based access controls using roles and attributes for context-aware access control.
  • Amazon Inspector code scanning of Lambda functions Amazon Inspector now supports code scanning of AWS Lambda functions, expanding the existing capability to scan Lambda functions and associated layers for software vulnerabilities in application package dependencies. Amazon Inspector code scanning of Lambda functions scans custom proprietary application code you write within Lambda functions for security vulnerabilities such as injection flaws, data leaks, weak cryptography, or missing encryption. Upon detecting code vulnerabilities within the Lambda function or layer, Amazon Inspector generates actionable security findings that provide several details, such as security detector name, impacted code snippets, and remediation suggestions to address vulnerabilities. The findings are aggregated in the Amazon Inspector console and integrated with AWS Security Hub and Amazon EventBridge for streamlined workflow automation.
  • Amazon Inspector SBOM export Amazon Inspector now offers the ability to export a consolidated Software Bill of Materials (SBOMs) for resources that it monitors across your organization in multiple industry-standard formats, including CycloneDx and Software Package Data Exchange (SPDX). With this new capability, you can use automated and centrally managed SBOMs to gain visibility into key information about your software supply chain. This includes details about software packages used in the resource, along with associated vulnerabilities. SBOMs can be exported to an Amazon Simple Storage Service (Amazon S3) bucket and downloaded for analyzing with Amazon Athena or Amazon QuickSight to visualize software supply chain trends. This functionality is available with a few clicks in the Amazon Inspector console or using Amazon Inspector APIs.
  • Amazon CodeGuru Security Amazon CodeGuru Security offers a comprehensive set of APIs that are designed to seamlessly integrate with your existing pipelines and tooling. CodeGuru Security serves as a static application security testing (SAST) tool that uses machine learning to help you identify code vulnerabilities and provide guidance you can use as part of remediation. CodeGuru Security also provides in-context code patches for certain classes of vulnerabilities, helping you reduce the effort required to fix code.
  • Amazon EC2 Instance Connect EndpointAmazon Elastic Compute Cloud (Amazon EC2) announced support for connectivity to instances using SSH or RDP in private subnets over the Amazon EC2 Instance Connect Endpoint (EIC Endpoint). With this capability, you can connect to your instances by using SSH or RDP from the internet without requiring a public IPv4 address.
  • AWS built-in partner solutions AWS built-in partner solutions are co-built with AWS experts, helping to ensure that AWS Well-Architected security reference architecture guidelines and best security practices were rigorously followed. AWS built-in partner solutions can save you valuable time and resources by getting the building blocks of cloud development right when you begin a migration or modernization initiative. AWS built-in solutions also automate deployments and can reduce installation time from months or weeks to a single day. Customers often look to our partners for innovation and help with “getting cloud right.” Now, partners with AWS built-in solutions can help you be more efficient and drive business value for both partner software and AWS native services.
  • AWS Cyber Insurance Partners AWS has worked with leading cyber insurance partners to help simplify the process of obtaining cyber insurance. You can now reduce business risk by finding and procuring cyber insurance directly from validated AWS cyber insurance partners. To reduce the amount of paperwork and save time, download and share your AWS Foundational Security Best Practices Standard detailed report from AWS Security Hub and share the report with the AWS Cyber Insurance Partner of your choice. With AWS vetted cyber insurance partners, you can have confidence that these insurers understand AWS security posture and are evaluating your environment according to the latest AWS Security Best Practices. Now you can get a full cyber insurance quote in just two business days.
  • AWS Global Partner Security Initiative With the AWS Global Partner Security Initiative, AWS will jointly develop end-to-end security solutions and managed services, leveraging the capabilities, scale, and deep security knowledge of our Global System Integrators (GSI) partners.
  • Amazon Detective finding groups Amazon Detective expands its finding groups capability to include Amazon Inspector findings, in addition to Amazon GuardDuty findings. Using machine learning, this extension of the finding groups feature significantly streamlines the investigation process, reducing the time spent and helping to improve identification of the root cause of security incidents. By grouping findings from Amazon Inspector and GuardDuty, you can use Detective to answer difficult questions such as “was this EC2 instance compromised because of a vulnerability?” or “did this GuardDuty finding occur because of unintended network exposure?” Furthermore, Detective maps the identified findings and their corresponding tactics, techniques, and procedures to the MITRE ATT&CK framework, enhancing the overall effectiveness and alignment of security measures.
  • [Pre-announce] AWS Private Certificate Authority Connector for Active Directory –— AWS Private CA will soon launch a Connector for Active Directory (AD). The Connector for AD will help to reduce upfront public key infrastructure (PKI) investment and ongoing maintenance costs with a fully managed serverless solution. This new feature will help reduce PKI complexity by replacing on-premises certificate authorities with a highly secure hardware security module (HSM)-backed AWS Private CA. You will be able to automatically deploy certificates using auto-enrollment to on-premises AD and AWS Directory Service for Microsoft Active Directory.
  • AWS Payment Cryptography The day before re:Inforce, AWS Payment Cryptography launched with general availability. This service simplifies cryptography operations in cloud-hosted payment applications. AWS Payment Cryptography simplifies your implementation of the cryptographic functions and key management used to secure data and operations in payment processing in accordance with various PCI standards.
  • AWS WAF Fraud Control launches account creation fraud prevention AWS WAF Fraud Control announces Account Creation Fraud Prevention, a managed protection for AWS WAF that’s designed to prevent creation of fake or fraudulent accounts. Fraudsters use fake accounts to initiate activities, such as abusing promotional and sign-up bonuses, impersonating legitimate users, and carrying out phishing tactics. Account Creation Fraud Prevention helps protect your account sign-up or registration pages by allowing you to continuously monitor requests for anomalous digital activity and automatically block suspicious requests based on request identifiers and behavioral analysis.
  • AWS Security Hub automation rules AWS Security Hub, a cloud security posture management service that performs security best practice checks, aggregates alerts, and facilitates automated remediation, now features a capability to automatically update or suppress findings in near real time. You can now use automation rules to automatically update various fields in findings, suppress findings, update finding severity and workflow status, add notes, and more.
  • Amazon S3 announces dual-layer server-side encryption Amazon S3 is the only cloud object storage service where you can apply two layers of encryption at the object level and control the data keys used for both layers. Dual-layer server-side encryption with keys stored in AWS Key Management Service (DSSE-KMS) is designed to adhere to National Security Agency Committee on National Security Systems Policy (CNSSP) 15 for FIPS compliance and Data-at-Rest Capability Package (DAR CP) Version 5.0 guidance for two layers of MFS U/00/814670-15 Commercial National Security Algorithm (CNSA) encryption.
  • AWS CloudTrail Lake dashboards AWS CloudTrail Lake, a managed data lake that lets organizations aggregate, immutably store, visualize, and query their audit and security logs, announces the general availability of CloudTrail Lake dashboards. CloudTrail Lake dashboards provide out-of-the-box visualizations and graphs of key trends from your audit and security data directly within the CloudTrail console. It also offers the flexibility to drill down on additional details, such as specific user activity, for further analysis and investigation using CloudTrail Lake SQL queries.
  • AWS Well-Architected Profiles AWS Well-Architected introduces Profiles, which allows you to tailor your Well-Architected reviews based on your business goals. This feature creates a mechanism for continuous improvement by encouraging you to review your workloads with certain goals in mind first, and then complete the remaining Well-Architected review questions.

Watch on demand

Leadership sessions — You can watch the leadership sessions to learn from AWS security experts as they talk about essential topics, including open source software (OSS) security, Zero Trust, compliance, and proactive security.

Breakout sessions, lightning talks, and more — Explore our content across these six tracks:

  • Application Security— Discover how AWS, customers, and AWS Partners move fast while understanding the security of the software they build.
  • Data Protection — Learn how AWS, customers, and AWS Partners work together to protect data. Get insights into trends in data management, cryptography, data security, data privacy, encryption, and key rotation and storage.
  • Governance, Risk, and Compliance — Dive into the latest hot topics in governance and compliance for security practitioners, and discover how to automate compliance tools and services for operational use.
  • Identity and Access Management — Learn how AWS, customers, and AWS Partners use AWS Identity Services to manage identities, resources, and permissions securely and at scale. Discover how to configure fine-grained access controls for your employees, applications, and devices and deploy permission guardrails across your organization.
  • Network and Infrastructure Security — Gain practical expertise on the services, tools, and products that AWS, customers, and partners use to protect the usability and integrity of their networks and data.
  • Threat Detection and Incident Response — Discover how AWS, customers, and AWS Partners get the visibility they need to improve their security posture, reduce the risk profile of their environments, identify issues before they impact business, and implement incident response best practices.
  • You can also watch our Lightning Talks and the AWS On Air day 1 and day 2 livestream on demand.

Session presentation downloads are also available on the AWS Events Content page. If you’re interested in further in-person security learning opportunities, consider registering for AWS re:Invent 2023, which will be held from November 27 to December 1 in Las Vegas, NV. We look forward to seeing you there!

If you would like to discuss how these new announcements can help your organization improve its security posture, AWS is here to help. Contact your AWS account team today.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Nisha Amthul

Nisha Amthul

Nisha is a Senior Product Marketing Manager at AWS Security, specializing in detection and response solutions. She has a strong foundation in product management and product marketing within the domains of information security and data protection. When not at work, you’ll find her cake decorating, strength training, and chasing after her two energetic kiddos, embracing the joys of motherhood.

Author

Satinder Khasriya

Satinder leads the product marketing strategy and implementation for AWS Network and Application protection services. Prior to AWS, Satinder spent the last decade leading product marketing for various network security solutions across several technologies, including network firewall, intrusion prevention, and threat intelligence. Satinder lives in Austin, Texas and enjoys spending time with his family and traveling.

Ами ако ГЕРБ издигне Росен Желязков?

Post Syndicated from Емилия Милчева original https://www.toest.bg/ami-ako-gerb-izdigne-rosen-zhelyazkov/

Ами ако ГЕРБ издигне Росен Желязков?

Този път може и да има битка за София, макар все още да не са известни имената на всички кандидати за местния вот наесен. Но някои все пак бяха огласени необичайно рано като за български избори. Лидерът на ГЕРБ Бойко Борисов не издава кметската номинация за София, която партията управлява от 14 години (всъщност от 18, тъй като Борисов застъпи през 2005-та). Кандидатът ще бъде обявен през септември.

Автентично дясно, неавтентично дясно и градска десница

Номинацията на проф. Вили Лилков от старозаветното дясно провокира спекулации, че той ще е кандидатът, подкрепен от ГЕРБ. „Граждани за европейско развитие на България“ си партнират със СДС от години. „Вили Лилков е много добър. С него съм работил още като кмет на София“, коментира Борисов. Така си е. И предишният кмет Стефан Софиянски също е работил с Лилков, който има 15 години опит в Столичния общински съвет (СОС).

Кандидатурата на Вили Лилков е в конкуренция с кандидата на „Продължаваме промяната“, „Демократична България“, „Спаси София“ и „Екипът на София“ Васил Терзиев. (Настойчивост за издигането му проявиха от ПП.) Предприемачът в ИТ сферата е известен като един от съоснователите на софтуерната компания „Телерик“, продадена за 262,5 млн. долара. Вместо успехът на „Телерик“ да стане катапулт за кампанията му, високопоставените му роднини в ДС се оказаха камък на шията и т.нар. градска десница се разедини. Впрочем нещо подобно се случи и през 2007 г., когато СДС и ДСБ се обединиха и издигнаха банкера Мартин Заимов срещу Бойко Борисов, но шумът около дядо му – ген. Владимир Заимов, осъден на смърт през 1942 г. заради шпионаж в полза на СССР – също навреди на кампанията. Борисов спечели още на първи тур.

„В десницата ще имат проблем между рода̀та на ДС и истински автентичното дясно“, заяви онзи ден лидерът на ГЕРБ относно кандидатурата на Терзиев. Неавтентичното дясно в лицето на ГЕРБ обаче готви изненада, обмисляйки да номинира настоящия председател на 49-тия парламент и дългогодишен политик от ГЕРБ Росен Желязков за кмет на София. Засега непотвърдена официално.

Биографията на Росен Желязков е повече от подходяща – юрист, чиято кариера е стартирала като юрисконсулт на район „Средец“ на Столичната община, заместник-кмет и секретар на Столичната община. В добавка – главен секретар на Министерския съвет, транспортен министър, бил е и шеф на Държавна агенция „Електронно управление“. Желязков е от сдържаните политици от ГЕРБ, известен и със своята диалогичност. Като главен секретар на Министерския съвет той беше единствен обвиняем и оправдан за т.нар. афера „Костинброд“, гръмнала през 2013 г. ГЕРБ бяха обвинени, че са подготвяли изборна измама с печатането на допълнителни бюлетини в печатницата в Костинброд.

За Вили Лилков и избирателите му остава задачата да „подпрат“ Желязков на евентуален балотаж с Васил Терзиев.

Победа в София – път към властта

Ако ГЕРБ наистина издигнат Желязков за кмет на София, а не например настоящия председател на СОС Георги Георгиев (аналог на кандидатурата на Цецка Цачева на президентските избори), това показва само едно: ще се сражават за столицата. От София тръгна завоевателният поход на Борисов в политиката, също и бизнес обръчите – заради разпределянето на най-големия общински бюджет. През 2005-та, когато той става кмет, бюджетът на столицата е 467 млн. лв., през 2015 г. – близо 1,5 млрд. лв., а за 2022-ра – над 2,09 млрд. лв. За сравнение, настоящите разходи на София са колкото тези на Пловдив, Варна, Бургас, Русе и Стара Загора, взети заедно, че и отгоре.

ГЕРБ си разчиства пътя към София благодарение на съюзяването си с ПП–ДБ, незаменимата си подкрепа за правителство и отстраняването на главния прокурор Иван Гешев, който от щит се превърна във враг. Вече никой от предишните ѝ политически опоненти не я нарича „мафия“, не говори за „мутри“, корупция и инхаус договори, не споменава „Барселонагейт“.

Бойко Борисов, Кирил Петков, Христо Иванов, Атанас Атанасов вече не са просто сглобка (по Лена Бориславова), която е не-коалиция, а кооперация (по Росен Желязков). Скоро те ще подпишат споразумение за съвместно управление, което ще включва и процедура за съгласуване на законопроекти и назначения в регулаторите (които започнаха преди каквато и да било рамка). Макар и сплавени с ГЕРБ–СДС и ПП–ДБ в т.нар. конституционно мнозинство, ДПС са извън споразумението – но пък участват в разпределението на позиции в регулаторите и назначения за втория ешелон. Иначе ще разтурят „конституционното мнозинство“. Излиза, че получават „кеш“ срещу неизвестен резултат – досущ като авансите по инхаус договорите, които ще бъдат индексирани, както вече се разбра.

Всичко е не-любов

Така някогашните изчегъртващи и изчегъртвани, крадците на бъдеще и освободителите ще рестартират заедно държавността. (Май ДПС имаше такъв лозунг на изборите през април 2021 г.?) Всичко е узаконено, а с времето може и любов да се появи – понякога се случва в браковете по сметка.

Как тогава ще се води кампанията за София, след като управляващата коалиция позиционира като свой противник единствено президента Румен Радев, комуто се услади да управлява чрез няколко служебни кабинета? От една страна, проруските позиции на държавния глава публично срамят България пред ЕС и нейните съюзници в НАТО, от друга – всички управляващи имат нужда от неприятел.

„Не-коалицията или сглобката ознаменува края на разделението статукво–промяна“, заяви по БНР социологът от изследователския център „Тренд“ Димитър Ганев. Според него водещото разделение вече е по линия на евроатлантизма. Ганев отбелязва, че това си е проличало особено ясно с посещението на украинският президент Зеленски, а позициите на президента Румен Радев пък са затвърдили легитимацията на настоящото управление.

В кампанията за местни избори обаче това разделение не може да се използва – освен в два-три от 24-те столични района. Засега управляващите действат по традиционния предизборен начин – всеки, който протестира, получава увеличение на заплатите в бюджет 2023, реформи няма, всичко си е постарому. Засега предизборните акции са доста беззъби, тъй като номинираните претенденти за кметския стол – Васил Терзиев, Деян Николов („Възраждане“), Вили Лилков – са сдържани в оценките си за състоянието на най-големия град след дългогодишното управление на ГЕРБ, което още не е приключило.

Битката за София не се води със снимки на изпочупени жълти плочки в социалните мрежи. Битката е срещу модела „ГЕРБ“ – иначе моделът „ГЕРБ“ ще погълне всички. За сглобките в СОС или в парламента никой няма илюзии.

AI and Microdirectives

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/07/ai-and-microdirectives.html

Imagine a future in which AIs automatically interpret—and enforce—laws.

All day and every day, you constantly receive highly personalized instructions for how to comply with the law, sent directly by your government and law enforcement. You’re told how to cross the street, how fast to drive on the way to work, and what you’re allowed to say or do online—if you’re in any situation that might have legal implications, you’re told exactly what to do, in real time.

Imagine that the computer system formulating these personal legal directives at mass scale is so complex that no one can explain how it reasons or works. But if you ignore a directive, the system will know, and it’ll be used as evidence in the prosecution that’s sure to follow.

This future may not be far off—automatic detection of lawbreaking is nothing new. Speed cameras and traffic-light cameras have been around for years. These systems automatically issue citations to the car’s owner based on the license plate. In such cases, the defendant is presumed guilty unless they prove otherwise, by naming and notifying the driver.

In New York, AI systems equipped with facial recognition technology are being used by businesses to identify shoplifters. Similar AI-powered systems are being used by retailers in Australia and the United Kingdom to identify shoplifters and provide real-time tailored alerts to employees or security personnel. China is experimenting with even more powerful forms of automated legal enforcement and targeted surveillance.

Breathalyzers are another example of automatic detection. They estimate blood alcohol content by calculating the number of alcohol molecules in the breath via an electrochemical reaction or infrared analysis (they’re basically computers with fuel cells or spectrometers attached). And they’re not without controversy: Courts across the country have found serious flaws and technical deficiencies with Breathalyzer devices and the software that powers them. Despite this, criminal defendants struggle to obtain access to devices or their software source code, with Breathalyzer companies and courts often refusing to grant such access. In the few cases where courts have actually ordered such disclosures, that has usually followed costly legal battles spanning many years.

AI is about to make this issue much more complicated, and could drastically expand the types of laws that can be enforced in this manner. Some legal scholars predict that computationally personalized law and its automated enforcement are the future of law. These would be administered by what Anthony Casey and Anthony Niblett call “microdirectives,” which provide individualized instructions for legal compliance in a particular scenario.

Made possible by advances in surveillance, communications technologies, and big-data analytics, microdirectives will be a new and predominant form of law shaped largely by machines. They are “micro” because they are not impersonal general rules or standards, but tailored to one specific circumstance. And they are “directives” because they prescribe action or inaction required by law.

A Digital Millennium Copyright Act takedown notice is a present-day example of a microdirective. The DMCA’s enforcement is almost fully automated, with copyright “bots” constantly scanning the internet for copyright-infringing material, and automatically sending literally hundreds of millions of DMCA takedown notices daily to platforms and users. A DMCA takedown notice is tailored to the recipient’s specific legal circumstances. It also directs action—remove the targeted content or prove that it’s not infringing—based on the law.

It’s easy to see how the AI systems being deployed by retailers to identify shoplifters could be redesigned to employ microdirectives. In addition to alerting business owners, the systems could also send alerts to the identified persons themselves, with tailored legal directions or notices.

A future where AIs interpret, apply, and enforce most laws at societal scale like this will exponentially magnify problems around fairness, transparency, and freedom. Forget about software transparency—well-resourced AI firms, like Breathalyzer companies today, would no doubt ferociously guard their systems for competitive reasons. These systems would likely be so complex that even their designers would not be able to explain how the AIs interpret and apply the law—something we’re already seeing with today’s deep learning neural network systems, which are unable to explain their reasoning.

Even the law itself could become hopelessly vast and opaque. Legal microdirectives sent en masse for countless scenarios, each representing authoritative legal findings formulated by opaque computational processes, could create an expansive and increasingly complex body of law that would grow ad infinitum.

And this brings us to the heart of the issue: If you’re accused by a computer, are you entitled to review that computer’s inner workings and potentially challenge its accuracy in court? What does cross-examination look like when the prosecutor’s witness is a computer? How could you possibly access, analyze, and understand all microdirectives relevant to your case in order to challenge the AI’s legal interpretation? How could courts hope to ensure equal application of the law? Like the man from the country in Franz Kafka’s parable in The Trial, you’d die waiting for access to the law, because the law is limitless and incomprehensible.

This system would present an unprecedented threat to freedom. Ubiquitous AI-powered surveillance in society will be necessary to enable such automated enforcement. On top of that, research—including empirical studies conducted by one of us (Penney)—has shown that personalized legal threats or commands that originate from sources of authority—state or corporate—can have powerful chilling effects on people’s willingness to speak or act freely. Imagine receiving very specific legal instructions from law enforcement about what to say or do in a situation: Would you feel you had a choice to act freely?

This is a vision of AI’s invasive and Byzantine law of the future that chills to the bone. It would be unlike any other law system we’ve seen before in human history, and far more dangerous for our freedoms. Indeed, some legal scholars argue that this future would effectively be the death of law.

Yet it is not a future we must endure. Proposed bans on surveillance technology like facial recognition systems can be expanded to cover those enabling invasive automated legal enforcement. Laws can mandate interpretability and explainability for AI systems to ensure everyone can understand and explain how the systems operate. If a system is too complex, maybe it shouldn’t be deployed in legal contexts. Enforcement by personalized legal processes needs to be highly regulated to ensure oversight, and should be employed only where chilling effects are less likely, like in benign government administration or regulatory contexts where fundamental rights and freedoms are not at risk.

AI will inevitably change the course of law. It already has. But we don’t have to accept its most extreme and maximal instantiations, either today or tomorrow.

This essay was written with Jon Penney, and previously appeared on Slate.com.

На второ четене: „Летният брат“

Post Syndicated from Стефан Иванов original https://www.toest.bg/na-vtoro-chetene-letniyat-brat/

„Летният брат“ от Яп Робен

превод от нидерландски Мария Енчева, изд. ICU, 2022

На второ четене: „Летният брат“

Може да прозвучи невероятно, но съм на мнение, че това е една от най-подходящите книги за четене през ваканцията и на плажа. Да, не е типичното жанрово и донякъде срамно читателско удоволствие, към което масово се прибягва. Няма динамичен криминален сюжет, липсват главозамайващи конспиративни обрати и неразгадаеми на пръв поглед загадки и интелектуални пъзели.

Подходящо е за плажа или планината, или където се случи да почива човек, защото всеки читател може да се огледа и да види, поне в България, какви хора не вижда около себе си. Да, на плажа има широк диапазон от представители на обществото, но наоколо не се виждат физически предизвикани хора, няма хора с увреждания, както рядко ги има и в градска среда. Те не са особено видими, а и градовете не ги приласкават и приобщават нито с лекота, нито с желание към себе си.

Тази книга разказва за това как двама братя, единият с увреждания, и баща им прекарват лятото си.

Книгата е брутално нежна, без да е елементарно сантиментална и без манипулативно да натиска бутоните за трогване. Не, книгата е етическа и сложна, детайлна и реалистична. Не се гнуси от това да представи пълнокръвно, дълбоко и сериозно какви условия и възможности имат маргинализирани и бедни хора в богата страна като Нидерландия.

Робен не пише тази история, за да унижава националния дух на страната си с критични и непатриотични писания. Не романтизира страданието или уврежданията, а разказва именно за да разширява полето на видимото за тези, които не живеят такъв живот. За да им отвори очите и сърцата, за да ги накара да излязат от ергономичния си и предвидим комфорт и да видят, че има и друго. И това друго не е константа, то търпи промяна и може да бъде различно.

Социална и културна промяна чрез политики на човещина.

Писателското намерение на нидерландския писател не е нито случайно, нито преднамерено, а е продиктувано от личен опит и човешка необходимост. Не е плод на абстрактен хуманизъм, модна тенденция или кабинетна хрумка. Писането на Робен е пронизано от честност. Родителите му работят в институция за психично здраве и още като дете той вижда какво е някой да е изоставен и сам. И този опит, и това знание са предадени майсторски.

Неслучайно произведението му е номинирано за Международната награда „Букър“.

Робен е писателско въплъщение на думите

чувам тези гласове, които няма да бъдат удавени,

които пише Монтегю Слейтър в либретото си на операта „Питър Граймс“ от Бенджамин Бритън. Робен чува и вижда бедните и болните и им връща с пълна сила цялото им пълнокръвно достойнство и сложност. Героите му не искат снизхождение, те не са жертви, а са мощни и интересни, противоречиви и многостранни. В постоянно търсене на начини да се погрижат за себе си, някои – в реда на нещата и според правилата, други – напълно нередни. От пазаруване на вересия до откровени дребни кражби и кокошкарски схеми с преотдаване на фургони под наем, с измами и експлоатация на социални политики.

На второ четене: „Летният брат“

Двамата братя Люсиен и Брайън са деца, чиито разведени родители несъзнателно играят на прехвърляне на отговорност. Основната сюжетна линия в романа е за това как Люсиен се връща от дома за специализирани грижи за хора с увреждания, за да прекара лятото с баща си, и главната грижа и водещата роля е поета от малкия му брат. Брайън, с опити и грешки, постепенно открива у себе си търпение и добрина. Постепенно разбира, че кутиите с етикети, в които хората с лекота биват разпределяни като предмети и в които остават затворени завинаги, всъщност са предразсъдък, продиктуван от страх и отказ от разбиране. Кутиите са глупави и едноизмерни. В тях има не невинност и приемане, а отказ животът да бъде приет в сложното му многообразие.

За едно лято Брайън се научава не само да мисли, но и да живее извън кутията, в която е възпитан. За разлика от баща си, който поне на пръв поглед отказва да се промени и остава засрамен, незрял, безотговорен и уплашен.

Непрекъснато трябва да се припомня, че диапазонът на това какво е човек и какво е човешко, е доста широк и не е човешко да бъде стесняван.

Това е лакмус за развитието на дадено общество. При дебати кое е естествено и нормално, може да се даде бърз и ясен отговор: каквото може да се наблюдава, има го и е съвместимо с живота, е естествено и нормално. Някои от естествените и нормалните неща, за съжаление, не са добри.

Търпението, грижата и съчувствието са едни от най-лесните неща на света. Това е така, защото хората са способни да очовечат всичко. Коли, котки, облаци, картофи, компютри или тостери. Уви, още по-лесно е обезчовечаването. Дехуманизицията и агресивното самодоволно определяне кой е човек, кой е половин или четвърт човек и кой е цял, кой е пълноценен и кой непълноценен, кой е човек и нечовек, е дебат, който няма място. Той е решен. Цивилизационно, културно и правосъдно. И това решение трябва да се отстоява постоянно, за да не се повтарят катастрофи от миналото.

Винаги е време да се изтегли нов таван за кредит на човещина и да се увеличава перото в бюджета за доброта. Това напомня този безкомпромисен роман. Това е маркер за цивилизованост, защото свободата не е само свободно да купуваш или продаваш и да си правиш каквото ти е приятно. Културата е и самоограничение, а това значи най-общо да не вършиш лоши неща, да не нарушаваш правилата и да носиш лична отговорност.

В ирисите му е изрисувана Вселената. Мама все така казваше. Може да не умеел много неща, но пък носел Вселената в очите си.

Емпатията има граници и е добре те да се уважават. Не можеш да си в обувките на другия, но можеш да знаеш, че е друг, да знаеш точно защо е друг, и да не затваряш очите си за това, да му подадеш ръка, да го обичаш или поне да ти пука. Защото една от формите на болка не е само физическа, има и

особена форма на безутешност и тя е да се усещаш невидим,

очите да те отбягват поради невежество или ограниченост. Човешкото естество невинаги е приемливо или лесно, но това не означава, че не съществува или че не заслужава нормално всекидневно отношение и условия за живот. И да, има разлика между лицемерие, политическа коректност и доброта.

В интерес на истината, всеки, който живееше тук, беше наполовина динозавър. Ала уврежданията им бяха толкова различни, че всеки сам по себе си като че ли беше последният представител на вида си.

Брайън смята брат си за динозавър, за нещо безследно изчезнало пред очите му, за липса, запълваща пространството, за създание, което не принадлежи към човешкия вид. В рамките на едно лято заедно той налучква пътя към Люсиен, но и към себе си, и преминава през лабиринт от грешки, безсилие и гняв, за да достигне до нещо неподозирано, воден от интуицията на сърцето си.

Ако в тоя момент по пътя над нас мине кола, шофьорът ѝ ще съзре между дърветата две момчета, нагазили до колене в потока. И ще му се стори, че се прегръщат.
– Ти си ми брат – казвам аз. – Братя сме.

Може и да не е случайно, че Рутгер Брегман, автор на „Утопия за реалисти“ и Humankind: A Hopeful History, е нидерландец, също като Робен. В книгите си той предлага нов подход в борбата с бедността и неравенството и гледа от перспективата, че повечето хора са добри и само една част не са решили наистина да го проявят, не са решили да действат всекидневно като такива. Гледната му точка е противоотрова срещу безпощадния цинизъм и безсилното повдигане на рамене.

В един момент книгата се сроди пред очите ми и с „Братята с лъвски сърца“ на Астрид Линдгрен. Идеята за изключителната история на тези други двама братя се поражда, когато шведската писателка наблюдава един фантастичен изгрев с розова светлина над езеро:

… да, беше нещо с неземна красота, и изведнъж имах силно преживяване, нещо като видение за изгряващата светлина на човечеството, и почувствах, че нещо се запалва отвътре.

Писах този текст на 17 юли, когато имен ден има Марин Бодаков, а една от неговите най-лични книги е именно тази на Линдгрен. Марин има забележителен текст, обяснение в любов към Астрид и писането ѝ:

Вече 36 години съм възгрозен, възглупав и възпъзлив Шушулко, но нямам смел, лъчезарен и любящ брат като Йонатан Лъвското сърце, нямам никого.

Нямам кой да скочи с мен през прозореца, за да ме спаси от горящата къща. Нямам с кого да скоча в пропастта, за да го избавя от парализата – и да се веселим в блажената Нангилима.

Имам дъщеря, съпруга, родители, приятели, но брат, брат си нямам.

Чел съм „Братята с лъвски сърца“ за пръв път на 8, но едва сега разбирам разкритото в романа двойно дъно на смъртта, между което кипят приключения с насилие, красота и доблест.

Затова, когато дойде време да скоча през смъртта, Астрид Лъвското сърце ще ме помилва по бузата и ще литна в двойната бездна, хванал за ръка точно тази книга.

– О, Нангилима! Да, Йонатан, да! Виждам светлинката. Виждам светлинката.

Писане като това на Робен (автор, прочее, и на детски книги) и Линдгрен ни кара да видим светлината, да съзрем братята и сестрите си на хартия и да очакваме изгрев, а не залез за човечеството. Приятно четене на плажа и дано нещо се запали отвътре, докато кожата е пощадена от изгаряне.


Активните дарители на „Тоест“ получават постоянна отстъпка в размер на 20% от коричната цена на всички заглавия от каталога на ICU, както и на няколко други български издателства в рамките на партньорската програма Читателски клуб „Тоест“. За повече информация прочетете на toest.bg/club.

Никой от нас не чете единствено най-новите книги. Тогава защо само за тях се пише? „На второ четене“ е рубрика, в която отваряме списъците с книги, публикувани преди поне година, четем ги и препоръчваме любимите си от тях. Рубриката е част от партньорската програма Читателски клуб „Тоест“. Изборът на заглавия обаче е единствено на авторите – Стефан Иванов и Севда Семер, които биха ви препоръчали тези книги и ако имаше как веднъж на две седмици да се разходите с тях в книжарницата.

Amazon Route 53 Resolver Now Available on AWS Outposts Rack

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-route-53-resolver-now-available-on-aws-outposts-rack/

Starting today, Amazon Route 53 Resolver is now available on AWS Outposts rack, providing your on-premises services and applications with local DNS resolution directly from Outposts. Local Route 53 Resolver endpoints also enable DNS resolution between Outposts and your on-premises DNS server. Route 53 Resolver on Outposts helps to improve your on-premises applications availability and performance.

AWS Outposts provides a hybrid cloud solution that allows you to extend your AWS infrastructure and services to your on-premises data centers. This enables you to build and operate hybrid applications that seamlessly integrate with your existing on-premises infrastructure. Your applications deployed on Outposts benefit from low-latency access to on-premises systems. You also get a consistent management experience across AWS Regions and your on-premises environments. This includes access to the same AWS management tools, APIs, and services that you use when managing AWS services in a Region. Outposts uses the same security controls and policies as AWS in the cloud, providing you with a consistent security posture across your hybrid cloud environment. This includes data encryption, identity and access management, and network security.

One of the typical use cases for Outposts is to deploy applications that require low-latency access to on-premises systems, such as factory equipment, high-frequency trading applications, or medical diagnosis systems.

DNS stands for Domain Name System, which is the system that translates human-readable domain names like “example.com” into IP addresses like “93.184.216.34” that computers use to communicate with each other on the internet. A Route 53 Resolver is a component that is responsible for resolving domain names to IP addresses.

Until today, applications and services running on an Outpost forwarded their DNS queries to the parent AWS Region the Outpost is connected to. But remember, as Amazon CTO Dr Werner Vogels says: everything fails all the time. There can be temporary site disconnections—think about fiber cuts or weather events. When the on-premises facility becomes temporarily disconnected from the internet, local DNS resolution fails, making it difficult for applications and services to discover other services, even when they are running on the same Outposts rack. For example, applications running locally on the Outpost won’t be able to discover the IP address of a local database running on the same Outpost, or a microservice won’t be able to locate other microservices running locally.

Starting today, when you opt in for local Route 53 Resolvers on Outposts, applications and services will continue to benefit from local DNS resolution to discover other services—even in a parent AWS Region connectivity loss event. Local Resolvers also help to reduce latency for DNS resolutions as query results are cached and served locally from the Outposts, eliminating unnecessary round-trips to the parent AWS Region. All the DNS resolutions for applications in Outposts VPCs using private DNS are served locally.

In addition to local Resolvers, this launch also enables local Resolver endpoints. Route 53 Resolver endpoints are not new; creating inbound or outbound Resolver endpoints in a VPC has been available since November 2018. Today, you can also create endpoints inside the VPC on Outposts. Route 53 Resolver outbound endpoints enable Route 53 Resolvers to forward DNS queries to DNS resolvers that you manage, for example, on your on-premises network. In contrast, Route 53 Resolver inbound endpoints forward the DNS queries they receive from outside the VPC to the Resolver running on Outposts. It allows sending DNS queries for services deployed on a private Outposts VPC from outside of that VPC.

Let’s See It in Action
To create and test a local Resolver on Outposts, I first connect to the Outpost section of the AWS Management Console. I navigate to the Route 53 Outposts section and select Create Resolver.

Create local resolver on outpost

I select the Outpost on which I want to create the Resolver and enter a Resolver name. Then, I select the size of the instances to deploy the Resolver and the number of instances. The selection of instance size impacts the performance of the Resolver (the number of resolutions it can process per second). The default is an m5.large instance able to handle up to 7,000 queries per second. The number of instances impacts the availability of the Resolver, the default is four instances. I select Create Resolver to create the Resolver instances.

Create local resolver - choose instance type and number

After a few minutes, I should see the Resolver status becoming ✅ Operational.

Local resolver is operationalThe next step is to create the Resolver endpoint. Inbound endpoints allow to forward external DNS queries to the local Resolver on the Outpost. Outbound endpoints allow to forward locally initiated DNS queries to external DNS resolvers you manage. For this demo, I choose to create an inbound endpoint.

Under the Inbound endpoints section, I select Create inbound endpoint.

Local resolver - create inbound endpoint

I enter an Endpoint name, I choose the VPC in the Region to attach this endpoint to, and I select the previously created Security group for this endpoint.

Create inbound endpoint details

I select the IP address the endpoint will consume in each subnet. I can select to Use an IP address that is selected automatically or Use an IP address that I specify.

Create inbound endpoint - select an IP addressFinally, I select the instance type to bind to the inbound endpoint. The larger the instance, the more queries per second it will handle. The service creates two endpoint instances for high availability.

When I am ready, I select the Create inbound endpoint to start the creation process.

Create inbound endpoint - select the instance type

After a few minutes, the endpoint Status becomes ✅ Operational.

Create inbound endpoint sttaus operational

The setup is now ready to test. I therefore SSH-connect to an EC2 instance running on the Outpost, and I test the time it takes to resolve an external DNS name. Local Resolvers cache queries on the Outpost itself. I therefore expect my first query to take a few milliseconds and the second one to be served immediately from the cache.

Indeed, the first query resolves in 13 ms (see the line ;; Query time: 13 msec).

➜  ~ dig amazon.com

; <<>> DiG 9.16.38-RH <<>> amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35859
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;amazon.com.			IN	A

;; ANSWER SECTION:
amazon.com.		797	IN	A	52.94.236.248
amazon.com.		797	IN	A	205.251.242.103
amazon.com.		797	IN	A	54.239.28.85

;; Query time: 13 msec
;; SERVER: 10.0.0.2#53(10.0.0.2)
;; WHEN: Sun May 28 09:47:27 CEST 2023
;; MSG SIZE  rcvd: 87

And when I repeat the same query, it resolves in zero milliseconds, showing it is now served from a local cache.

➜  ~ dig amazon.com

; <<>> DiG 9.16.38-RH <<>> amazon.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63500
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;amazon.com.			IN	A

;; ANSWER SECTION:
amazon.com.		586	IN	A	54.239.28.85
amazon.com.		586	IN	A	205.251.242.103
amazon.com.		586	IN	A	52.94.236.248

;; Query time: 0 msec
;; SERVER: 10.0.0.2#53(10.0.0.2)
;; WHEN: Sun May 28 09:50:58 CEST 2023
;; MSG SIZE  rcvd: 87

Pricing and Availability
Remember that only the Resolver and the VPC endpoints are deployed on your Outposts. You continue to manage your Route 53 zones and records from the AWS Regions. The local Resolver and its endpoints will consume some capacity on the Outposts. You will need to provide four EC2 instances from your Outposts for the Route 53 Resolver and two other instances for each Resolver endpoint.

Your existing Outposts racks must have the latest Outposts software for you to use the local Route 53 Resolver and the Resolver endpoints. You can raise a ticket with us to have your Outpost updated (the console will also remind you to do so when needed).

The local Resolvers are provided without additional cost. The endpoints are charged per elastic network interface (ENI) per hour, as is already the case today.

You can configure local Resolvers and local endpoints in all AWS Regions where Outposts racks are available, except in AWS GovCloud (US) Regions. That’s a list of 22 AWS Regions as of today.

Go and configure local Route 53 Resolvers on Outposts now!

— seb

 

P.S. We’re focused on improving our content to provide a better customer experience, and we need your feedback to do so. Please take this quick survey to share insights on your experience with the AWS Blog. Note that this survey is hosted by an external company, so the link does not lead to our website. AWS handles your information as described in the AWS Privacy Notice.

Implementing patterns that exit early out of a parallel state in AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/implementing-patterns-that-exit-early-out-of-a-parallel-state-in-aws-step-functions/

This post is written by Madhav Vishnubhatta, Senior Technical Account Manager, Enterprise Support.

This blog post explains how to implement patterns in AWS Step Functions that control the break out of a parallel state as soon as a minimum requirement is met. The parallel state usually completes only when all the parallel flows inside it are completed. But if you do not want to wait for all of the parallel flows to complete before moving to the next step, this post provides patterns to help implement this functionality.

You can use AWS Step Functions to set up visual serverless workflows that orchestrate and coordinate multiple AWS services into a serverless workflow. This allows you to build complex, stateful, and scalable applications without managing the underlying infrastructure. In Step Functions, the individual steps are called states.

Step Functions offers multiple types of states. Some states help control the logic of the workflow. For example, the choice state enables conditional logic to control the flow to any one of the multiple possible next states, depending on the conditions defined in the state. The parallel state helps control the logic, but rather than choose one of multiple next states (as the choice state does), the parallel state allows all the branches to run as parallel flows concurrently. When all the parallel flows are complete, control moves on to the Parallel state’s next state.

Patterns that do not need to wait for all parallel flows to finish

Consider a scenario where the Step Functions workflow represents the process of an employee requesting a laptop in your organization. The process begins with a request from the employee as the first step, but the approval of this request could come from either of two IT managers.

In this case there could be two parallel flows, each waiting for an approval from one IT manager. But, as soon as one person provides approval, the workflow can move forward to the next step of actually issuing a laptop to the employee. This is an “either-or” pattern.

Consider a similar use-case but with a slightly different requirement. Instead of just one person’s approval being enough to issue a laptop, what if approval is needed from a minimum of two out of three IT managers before the laptop is issued. This is the “quorum” pattern.

The parallel state does not directly support these two patterns because the state waits for all the flows to complete. In this case, that means all the managers must provide an approval before a laptop can be issued.

Solution overview

Step Functions provides an error handling mechanism with the fail state, which can be used to fail the workflow with an error. This error can be caught downstream in the workflow and handled as needed. Both the either-or and the quorum patterns can be implemented with this fail state along with the error handling capability.

In case of either-or, as soon as the parallel flow is finished, the fail state can throw an error, which is caught outside the parallel state for further processing. Even though it is the fail state, it might not represent an error scenario in your use-case.

The quorum pattern needs an additional mechanism to store the status of each parallel flow, using an Amazon DynamoDB table. The quorum pattern creates an item in the DynamoDB table at the beginning of the workflow that is updated by each parallel flow as soon as it has completed. Each parallel flow checks the DynamoDB table to look at the number of processes that have completed and compare it against the quorum. If the quorum is met, that flow raises an error with a fail state that can be caught outside the parallel step.

Prerequisites

Both of these patterns are published on Serverless Land:

To deploy and use these patterns, you need:

  1. An AWS Account
  2. Access to login as a user or assume a role that can:
  3. Familiarity with AWS Serverless Application Model (AWS SAM).
  4. AWS SAM Command Line Interface installed.

Example walkthrough

Either-or pattern

To deploy the Either-or pattern, follow the deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.

Navigate to the AWS CloudFormation page in the AWS Management Console and choose the stack with the name provided during deployment. Choose the State Machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console. Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the exported workflow in the GitHub repo. This is the logic of the workflow:

Either-or patter. Conceptual flow.

  1. There are three (numbered) parallel flows in this workflow.
  2. Flows #1 and #2 are the main parallel flows, one of which completing should move the control to outside the Parallel state.
  3. Flow #3 is the time out flow so that the workflow can exit after a set amount of time if neither of the other two parallel flows complete by then.
  4. Each of the two main parallel flows follows the following logic:
    • Wait for the process to complete. This is a filler and can be replaced with your business logic on how to monitor process completion. This could be a human approval, or any other job that needs to finish.
    • Once process is complete, throw a dummy error, which moves control to outside the parallel state.
  5. The dummy errors for the two flows are caught outside the parallel state with corresponding catch condition.
  6. The errors from the two flows need not be caught separately. You might just do the same action no matter which of the parallel flows finished, but I show separate steps in case you need to do something different based on which parallel flow finished.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Quorum pattern

To deploy the Quorum pattern, follow the Deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.
  3. A DynamoDB Table called “QuorumWorkflowTable”.

Navigate to CloudFormation in the AWS Management Console and choose the stack with the name provided during deployment. Choose the state machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console.

Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the the exported workflow in the GitHub repo. This is the logic of the workflow:

Quorum pattern. Conceptual flow.

  1. The first step creates an entry in the DynamoDB table with the execution ID of the workflow’s execution. This item in the table tracks the completion of processes.
  2. The next state is the parallel state, which has three parallel flows and a fourth time out flow. All the four flows are numbered.
  3. Flow #1, #2, and #3 are the main parallel flows, two of which completing should move the control to outside the parallel state.
  4. Flow #4 is the timeout flow so that the workflow can exit after a set amount of time, if neither of the other two parallel flows complete by then.
  5. Each of the three main parallel flows uses the following logic:
    • Wait for the process to complete.
    • Once complete, update the DynamoDB table entry to mark the completion of the process.
    • After the update, query the item from DynamoDB to get the list of processes that have completed and check if the quorum has been met.
    • If the quorum has been met, raise an “Error” (which is actually a success criterion in terms of business case), to move the control to outside the parallel state.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Conclusion

This blog post shows how you can implement patterns that must exit early out of a parallel state in an AWS Step Functions workflow.

The use-cases for this approach are not limited to these two patterns. More complicated use-cases like having different combinations of conditions to exit a parallel state can all be implemented using parallel and fail states.

Visit Serverless Land for more Step Functions workflow patterns.

The collective thoughts of the interwebz